The AI agent infrastructure (controller, publisher pipeline, journalist) runs on Rancher Desktop but cannot be reliably reproduced from scratch. Three specific gaps exist:
No reproducibility guarantee. If Kyle resets Rancher Desktop to
factory defaults or moves to a different machine, there is no single
process to stand the agent stack back up. The current setup was built
incrementally with manual helm install and kubectl apply commands.
There is no documented, tested bootstrap path.
Inconsistent security posture. The OpenClaw deployment uses Vault for secrets, PSS restricted enforcement, and resource quotas. The agent controller — which runs AI agents that execute arbitrary code — stores 13 secrets in plain K8s Secrets, has no namespace-level PSS enforcement, and no resource quotas. The agent workloads are less hardened than the third-party tool they're replacing.
Two secrets patterns. OpenClaw uses Vault Agent Injector; the
agent controller uses K8s Secrets with env.valueFrom.secretKeyRef.
Maintaining two approaches increases cognitive load and makes security
auditing harder. AI agent pods are a higher exfiltration risk than
static workloads because they execute LLM-generated code at runtime.
This matters because the agent stack is becoming Kyle's primary productivity tool. Fragile, inconsistently hardened infrastructure under autonomous AI agents is an increasing risk as the system grows.
Tear down Rancher Desktop to a bare K3s cluster, run a single bootstrap process, and have the full AI agent stack (Vault, agent controller, supporting infra) running with production-grade security hardening — all from declarative config checked into the monorepo.
ai-agents namespace has
pod-security.kubernetes.io/enforce: restricted and all pods
schedule successfully under it.helm install / script
process, not a reconciliation loop. GitOps is overkill for a
single-operator local cluster.As Kyle, I want to run a single bootstrap process on a bare K3s cluster so that I get the full AI agent stack running without remembering manual steps.
Acceptance criteria:
As Kyle, I want all agent pod secrets to come from Vault so that secrets are audited, scoped by role, and never stored as plain K8s Secret objects.
Acceptance criteria:
ai-agents namespaceAs Kyle, I want the agent namespace to enforce the PSS restricted profile and resource quotas so that a misbehaving agent cannot escalate privileges or consume unbounded resources.
Acceptance criteria:
ai-agents namespace has pod-security.kubernetes.io/enforce: restricted and pod-security.kubernetes.io/warn: restricted
labelsai-agents namespaceAs Kyle, I want the IaC to work on any standard K8s cluster so that I can eventually deploy to a second machine or a cloud provider without rewriting the infrastructure.
Acceptance criteria:
local-path but overridable in values)As Kyle, I want a documented process to recover the agent stack after a machine reboot so that I don't lose time debugging why things aren't working.
Acceptance criteria:
ai-agents namespaceai-agents namespacehelm install --create-namespace)Vault Agent Injector vs. PSS restricted compatibility. The
Vault Agent Injector init container does not set a container-level
seccompProfile, which PSS restricted requires. HashiCorp closed
the feature request as NOT_PLANNED (vault-k8s #377). The
workaround is pod-level seccompProfile or the
vault.hashicorp.com/agent-json-patch annotation. This needs
testing to confirm it works on the current Vault Helm chart version
before committing to Agent Injector + PSS restricted.
Vault namespace PSS level. The Vault server itself may not
comply with PSS restricted. The Vault namespace likely needs
baseline enforcement, not restricted. The design doc should
determine what PSS level Vault's own namespace gets.
Controller webhook NetworkPolicy. The agent controller accepts webhooks on port 8080 but has no ingress NetworkPolicy. Should the bootstrap include an ingress rule allowing only specific sources, or is deny-all-ingress acceptable since webhooks are only used from kubectl port-forward?
Bootstrap orchestration approach. Three options exist: K3s's built-in HelmChart CRD (zero-dependency but K3s-specific), helmfile (portable but adds a dependency), or a shell script (simplest but least declarative). The design doc should evaluate these against the portability constraint.
Vault secret delivery mechanism. Agent Injector writes secrets to in-memory tmpfs (never touches etcd) but has PSS compatibility issues. Vault Secrets Operator is lighter-weight but syncs to K8s Secrets (stored in etcd). CSI Provider avoids etcd but doesn't support auto-renewal. The design doc should select the mechanism based on PSS compatibility testing results.
Vault storage backend. The current setup uses file storage. Raft (integrated storage) is forward-compatible if the cluster ever goes multi-node. Is Raft worth the complexity for a single-node setup, or is file storage sufficient?
Vault Agent Injector + PSS restricted incompatibility. If the
agent-json-patch workaround doesn't resolve the PSS violation,
the alternatives are: (a) drop to PSS baseline for the agent
namespace, (b) switch to Vault Secrets Operator (which syncs to K8s
Secrets, partially defeating the purpose), or (c) use the Vault CSI
Provider. Each alternative has tradeoffs documented in the research.
Bootstrap order sensitivity. Vault must be running and unsealed before agent pods can start (they depend on Vault Agent Injector). If the bootstrap script doesn't handle this ordering correctly, pods will crash-loop waiting for Vault. The script must wait for Vault readiness, not just Helm install completion.
Vault webhook timeout on K3s. K3s runs the API server as a binary, not a pod, which can cause webhook timeouts if the Vault Injector service isn't reachable yet. This is a known K3s gotcha that could cause intermittent pod scheduling failures during bootstrap.
Scope creep into GitOps. The non-goal of GitOps reconciliation
means drift is possible — someone could kubectl edit a resource
and diverge from the repo. This is acceptable for a single-operator
setup but should be acknowledged.
Increased operational complexity. Adding Vault means an additional component to manage, unseal after reboots, and debug when things go wrong. The wiki guide and post-reboot script mitigate this, but Vault failures will block all agent workloads until resolved.