Link to PRD: Hardened IaC Bootstrap
The AI agent infrastructure (controller, publisher pipeline, journalist cron) runs on Rancher Desktop but was built incrementally with manual commands. It stores 13+ secrets in plain K8s Secrets, has no PSS enforcement, and no ResourceQuota. Meanwhile, the OpenClaw deployment — which the agent stack is replacing — already uses Vault, PSS restricted, and quotas. This design doc addresses how to bring the agent stack to the same (or better) security posture while making the entire setup reproducible from a single bootstrap process.
The technically interesting challenges are: (1) Vault Agent Injector compatibility with PSS restricted (HashiCorp closed the feature request as NOT_PLANNED), (2) secret delivery to short-lived Job pods vs. long-running Deployments, and (3) orchestrating a dependency-ordered bootstrap where Vault must be running and unsealed before any agent pod can start.
Goals:
ai-agents namespaceai-agentsinfra/ai-agents/Non-Goals:
graph TD
subgraph "Bootstrap Process"
HF[helmfile sync] --> V[Vault Release]
HF --> AC[Agent Controller Release]
AC -.->|needs| V
end
subgraph "vault namespace"
VP[vault-0 Pod] --> FS[File Storage PVC]
VI[Vault Injector] --> VP
end
subgraph "ai-agents namespace
PSS: restricted"
NS[Namespace + PSS Labels]
RQ[ResourceQuota]
CTRL[Controller Deployment]
JOB[Agent Job Pods]
CTRL -->|creates| JOB
JOB -->|Vault annotations| VI
end
VP -->|K8s Auth| SA[agent ServiceAccount]
SA --> JOB
subgraph "Vault KV v2"
S1[secret/ai-agents/anthropic]
S2[secret/ai-agents/github]
S3[secret/ai-agents/discord]
S4[secret/ai-agents/webhook]
end
VP --> S1
VP --> S2
VP --> S3
VP --> S4
infra/ai-agents/helmfile.yamlhelmfile sync — full install from scratchhelmfile apply — idempotent apply (diff + sync)infra/ai-agents/environments/The helmfile declares two releases: vault (HashiCorp Helm chart) and
agent-controller (local chart). The agent-controller release uses
needs: ["vault/vault"] to ensure Vault is installed first. A
postsync hook on the vault release calls a script that waits for
Vault readiness but does not auto-init or auto-unseal — those
remain manual steps (init is one-time; unseal is post-reboot).
infra/ai-agents/vault/agent-controller ServiceAccount in
ai-agents namespaceai-agents-read granting read on
secret/data/ai-agents/*Vault runs in its own vault namespace (reusing the existing Vault
installation from OpenClaw, reconfigured for the agent stack). The
Vault namespace gets PSS baseline with warn: restricted — the
Vault server and injector pods cannot fully satisfy PSS restricted
due to seccomp and capabilities requirements in the injector webhook.
Secrets are split by concern under secret/ai-agents/:
| Path | Keys |
|---|---|
secret/ai-agents/anthropic |
oauth_token, disable_nonessential_traffic |
secret/ai-agents/github |
app_id, app_private_key, install_id |
secret/ai-agents/discord |
bot_token, guild_id, log_channel_id |
secret/ai-agents/webhook |
token |
secret/ai-agents/openrouter |
api_key |
The controller Deployment and agent Job pods receive these via Vault
Agent Injector annotations. The injector writes secrets to an in-memory
tmpfs volume at /vault/secrets/ — secrets never touch etcd.
infra/ai-agents/agent-controller/ (moved from
infra/ai-agents/agent-controller/)seccompProfile: RuntimeDefault at pod level/vault/secrets/config instead of
K8s Secret envFromThe controller Go code must be modified to:
seccompProfile: RuntimeDefaultcapabilities: {drop: ["ALL"]} to all containersenvFrom secret references with a shell source pattern
(. /vault/secrets/config) in the entrypoint commandNote: JSON-patch annotations for injected containers are NOT needed. Vault chart v0.29.1 injected containers satisfy PSS restricted by default (confirmed in TASK-002 spike).
ai-agents namespace with PSS labels
and be the anchor for ResourceQuotainfra/ai-agents/agent-controller/helm/templates/namespace.yamlenforce: restricted, warn: restricted--create-namespace)ai-agentsinfra/ai-agents/agent-controller/helm/templates/resourcequota.yamlConservative limits matching the single-pipeline model:
| Resource | Limit |
|---|---|
requests.cpu |
2 |
requests.memory |
4Gi |
limits.cpu |
4 |
limits.memory |
8Gi |
pods |
8 |
The controller Deployment (50m CPU / 64Mi) leaves ample room for agent Jobs. Pod count of 8 allows for the controller + a few concurrent Jobs with their Vault init/sidecar containers.
Note: completed Job pods continue to consume quota until deleted.
The existing ttlSecondsAfterFinished: 3600 on agent Jobs handles
cleanup. This is sufficient for the single-pipeline model.
The existing agent-egress NetworkPolicy needs two changes:
vault namespace
(agent pods need to reach Vault for secret injection)kube-system to
the controller on port 8080 (defense-in-depth for webhook, even
though kubectl port-forward bypasses NetworkPolicy)Move all agent infrastructure under infra/ai-agents/:
infra/ai-agents/
helmfile.yaml
environments/
default.yaml # StorageClass, resource limits, image tags
vault/
values.yaml # Vault Helm chart values
policy.hcl # ai-agents-read Vault policy
network-policy.yaml # Vault namespace NetworkPolicy
agent-controller/ # (moved from infra/ai-agents/agent-controller/)
helm/
templates/
namespace.yaml # NEW: PSS labels
resourcequota.yaml # NEW
networkpolicy.yaml # MODIFIED: Vault egress + controller ingress
deployment.yaml # MODIFIED: Vault annotations
secret.yaml # DELETED (replaced by Vault)
...
pkg/
controller/
controller.go # MODIFIED: Vault annotations on Jobs
...
ai-agent-runtime/ # (moved from infra/ai-agents/ai-agent-runtime/)
...
bin/
bootstrap.sh # Wraps helmfile sync + prints manual steps
store-secrets.sh # Interactive Vault secret storage
unseal.sh # Post-reboot Vault unseal + health check
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "ai-agents"
vault.hashicorp.com/agent-inject-secret-config: "secret/ai-agents/anthropic"
vault.hashicorp.com/agent-inject-template-config: |
{{- with secret "secret/ai-agents/anthropic" -}}
export CLAUDE_CODE_OAUTH_TOKEN="{{ .Data.data.oauth_token }}"
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC="{{ .Data.data.disable_nonessential_traffic }}"
{{- end }}
{{- with secret "secret/ai-agents/github" -}}
export GITHUB_APP_ID="{{ .Data.data.app_id }}"
export GITHUB_APP_PRIVATE_KEY="{{ .Data.data.app_private_key }}"
export GITHUB_INSTALL_ID="{{ .Data.data.install_id }}"
{{- end }}
{{- with secret "secret/ai-agents/discord" -}}
export DISCORD_BOT_TOKEN="{{ .Data.data.bot_token }}"
export DISCORD_GUILD_ID="{{ .Data.data.guild_id }}"
export DISCORD_LOG_CHANNEL_ID="{{ .Data.data.log_channel_id }}"
{{- end }}
{{- with secret "secret/ai-agents/webhook" -}}
export AI_WEBHOOK_TOKEN="{{ .Data.data.token }}"
{{- end }}
{{- with secret "secret/ai-agents/openrouter" -}}
export OPENROUTER_API_KEY="{{ .Data.data.api_key }}"
{{- end }}
# No agent-json-patch or agent-init-json-patch needed.
# Vault chart v0.29.1 injected containers already include:
# - allowPrivilegeEscalation: false
# - capabilities.drop: [ALL]
# - runAsNonRoot: true
# seccompProfile: RuntimeDefault is satisfied via pod-level inheritance.
# (Confirmed by TASK-002 spike 2026-03-17)
Usage: bin/bootstrap.sh [--build-images]
Prereqs: kubectl, helm, helmfile, docker on PATH; cluster reachable
Secrets: Vault init is manual; secrets stored via bin/store-secrets.sh
Steps:
1. helmfile sync (installs Vault + agent controller)
2. Prints manual steps: vault init, unseal, store secrets
3. Applies CRDs and sample AgentTask manifests
Usage: bin/unseal.sh
Reads: ~/.vault-init (unseal key)
Steps:
1. Unseals Vault
2. Verifies Vault is ready
3. Checks agent controller pod is Running
4. Reports status
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| Helmfile | Declarative YAML, dependency ordering via needs, portable across K8s distros, parallel install of independent releases |
Adds one binary dependency (helmfile) |
Chosen — best balance of declarative config and portability |
| Shell script | Zero dependencies beyond helm/kubectl, easiest to understand | Least declarative, dependency ordering is manual, harder to make idempotent | Rejected — too fragile for multi-release ordering |
| K3s HelmChart CRD | Zero-dependency on K3s, declarative | K3s-specific (violates portability requirement), no dependency ordering | Rejected — violates PRD portability constraint |
| Makefile | No extra binary, dependency ordering via prerequisites | Not declarative for Helm releases, awkward for values files | Rejected — poor fit for Helm-native workflows |
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| Vault Agent Injector | Secrets on tmpfs (never in etcd), auto-renewal, existing pattern from OpenClaw; injected containers satisfy PSS restricted out-of-the-box in chart v0.29.1 (no JSON-patch needed) | Heavier (sidecar per pod) | Chosen — spike confirmed |
| Vault Secrets Operator | Lighter-weight, Helm values can satisfy PSS restricted | Syncs to K8s Secrets (stored in etcd), partially defeats purpose of Vault migration | Fallback if Injector fails spike |
| Vault CSI Provider | No etcd storage, lighter than Injector | No auto-renewal, socket permission issue (#296) blocks non-root, unresolved upstream | Rejected — too many open issues |
TASK-002 spike confirmed Agent Injector passes PSS restricted without any JSON-patch annotations (Vault chart v0.29.1, 2026-03-17). Option A is chosen. Fallback to Vault Secrets Operator is not needed.
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| File storage | Simple, proven, already in use for OpenClaw Vault | Single-node only, migration needed if going multi-node | Chosen — sufficient for single-node K3s, avoids unnecessary complexity |
| Raft integrated storage | Multi-node ready, no migration needed later | More complex setup, overkill for single-node | Rejected — YAGNI for v1 |
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| PSS baseline + warn restricted | Vault server works without mlock; Injector webhook avoids seccomp violations | Not fully restricted | Chosen — Injector webhook cannot satisfy restricted without upstream changes |
| PSS restricted | Maximum security | Vault Injector webhook pod violates restricted (no seccomp, no capabilities drop in chart defaults); requires extensive overrides that may break functionality | Rejected — risk of breaking Vault webhook |
| No PSS | Simplest | Misses the security goal entirely | Rejected — contradicts PRD |
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| Split by concern | Enables future per-agent-type policies, clear ownership | More Vault paths to manage, more complex injection template | Chosen — forward-compatible with fine-grained access control |
| Single path | Simpler policy, one injection template | All-or-nothing access, no granularity for future scoping | Rejected — doesn't set up for per-agent policies |
| Option | Pros | Cons | Verdict |
|---|---|---|---|
infra/ai-agents/ (consolidated) |
Single directory for all agent infra, clear ownership, one helmfile | Requires moving existing files | Chosen — clean separation, matches the "one system" mental model |
infra/ai-agents/agent-controller/ (keep existing) |
No file moves needed | Vault config, runtime image, and controller scattered across directories | Rejected — fragmented ownership |
| Action | File | Rationale |
|---|---|---|
| CREATE | infra/ai-agents/helmfile.yaml |
Bootstrap orchestration with dependency ordering |
| CREATE | infra/ai-agents/environments/default.yaml |
Environment-specific values (StorageClass, resource limits) |
| CREATE | infra/ai-agents/vault/values.yaml |
Vault Helm chart values (based on existing openclaw config) |
| CREATE | infra/ai-agents/vault/policy.hcl |
ai-agents-read Vault policy for agent secret access |
| CREATE | infra/ai-agents/vault/network-policy.yaml |
Vault namespace NetworkPolicy allowing ai-agents ingress |
| CREATE | infra/ai-agents/bin/bootstrap.sh |
Wrapper around helmfile sync + manual step instructions |
| CREATE | infra/ai-agents/bin/store-secrets.sh |
Interactive Vault secret storage for agent secrets |
| CREATE | infra/ai-agents/bin/unseal.sh |
Post-reboot Vault unseal + health verification |
| MOVE | infra/ai-agents/agent-controller/ → infra/ai-agents/agent-controller/ |
Directory consolidation |
| MOVE | infra/ai-agents/ai-agent-runtime/ → infra/ai-agents/ai-agent-runtime/ |
Directory consolidation |
| CREATE | infra/ai-agents/agent-controller/helm/templates/namespace.yaml |
Namespace with PSS restricted labels |
| CREATE | infra/ai-agents/agent-controller/helm/templates/resourcequota.yaml |
Conservative resource limits |
| MODIFY | infra/ai-agents/agent-controller/helm/templates/networkpolicy.yaml |
Add Vault egress + controller ingress rules |
| MODIFY | infra/ai-agents/agent-controller/helm/templates/deployment.yaml |
Add Vault annotations for controller pod secrets |
| DELETE | infra/ai-agents/agent-controller/helm/templates/secret.yaml |
Replaced by Vault secret injection |
| MODIFY | infra/ai-agents/agent-controller/helm/values.yaml |
Remove secrets block, add Vault config values |
| MODIFY | infra/ai-agents/agent-controller/pkg/controller/controller.go |
Add Vault annotations + PSS security context to Job specs |
| MODIFY | infra/ai-agents/agent-controller/helm/templates/deployment.yaml |
Add seccompProfile to controller pod |
| DELETE | infra/ai-agents/agent-controller/bin/setup.sh |
Replaced by helmfile-based bootstrap |
| CREATE | apps/blog/blog/markdown/wiki/devops/bootstrap.md |
Wiki guide: bootstrap steps, post-reboot, troubleshooting |
infra/ai-agents/helmfile.yaml, infra/ai-agents/environments/default.yamlinfra/ai-agents/agent-controller/ moved to infra/ai-agents/agent-controller/infra/ai-agents/ai-agent-runtime/ moved to infra/ai-agents/ai-agent-runtime/helmfile.yaml declares vault and agent-controller releases with needs dependencyenvironments/default.yaml has configurable StorageClass (default local-path)helmfile lint passes with no errors[P] ✓ COMPLETEagent-json-patch and agent-init-json-patch annotations tested against PSS restricted namespace — JSON-patch not needed; base annotations passseccompProfile: RuntimeDefault (via pod-level inheritance), capabilities.drop: [ALL] ✓, runAsNonRoot: true ✓, allowPrivilegeEscalation: false ✓capabilities.drop: [ALL] verified on injected containers — present by default in Vault chart v0.29.1infra/ai-agents/agent-controller/helm/templates/namespace.yamlpod-security.kubernetes.io/enforce: restricted and warn: restricted labels--create-namespacehelm upgrade --install applies the namespace manifest correctlyai-agents namespace are audited for PSS restricted complianceinfra/ai-agents/agent-controller/helm/templates/resourcequota.yamlrequests.cpu: 2, requests.memory: 4Gi, limits.cpu: 4, limits.memory: 8Gi, pods: 8infra/ai-agents/agent-controller/pkg/controller/controller.go, infra/ai-agents/agent-controller/helm/templates/deployment.yamlsecurityContext.seccompProfile.type: RuntimeDefault at pod levelseccompProfile: RuntimeDefault on generated Job specscapabilities: {drop: ["ALL"]} on all containers in generated Job specs (init + main)kubectl get events)infra/ai-agents/vault/values.yaml, infra/ai-agents/vault/policy.hcl, infra/ai-agents/vault/network-policy.yamlgcpckms)ai-agents-read grants read on secret/data/ai-agents/* and secret/metadata/ai-agents/*ai-agents namespace on TCP 8200baseline enforce + restricted warn labelsinfra/ai-agents/bin/store-secrets.sh, helmfile postsync hook scriptai-agents bound to agent-controller ServiceAccount in ai-agents namespacestore-secrets.sh interactively prompts for secrets split by concern (anthropic, github, discord, webhook, openrouter)store-secrets.sh uses vault kv patch (merge) with put fallbacksecret/ai-agents/anthropic, secret/ai-agents/github, etc.infra/ai-agents/agent-controller/pkg/controller/controller.go, infra/ai-agents/agent-controller/helm/templates/deployment.yaml/vault/secrets/config instead of K8s Secret envFrom[P]infra/ai-agents/agent-controller/helm/templates/secret.yaml, infra/ai-agents/agent-controller/helm/values.yamlsecret.yaml template deleted from Helm chartsecrets block removed from values.yamlhelm upgrade does not create or update the agent-secrets K8s Secretagent-secrets K8s Secret manually deleted from clusteragent-secrets via envFrom or valueFrominfra/ai-agents/agent-controller/helm/templates/networkpolicy.yamlkube-system on port 8080infra/ai-agents/bin/bootstrap.sh, infra/ai-agents/bin/unseal.shbootstrap.sh checks prerequisites (kubectl, helm, helmfile, docker)bootstrap.sh runs helmfile sync and prints manual post-install steps (vault init, unseal, store-secrets)bootstrap.sh applies CRDs and sample AgentTask manifests (journalist cron, publisher manual)bootstrap.sh is idempotent (running twice produces no errors)unseal.sh N/A — GCP KMS auto-unseal replaces manual unsealunseal.sh N/A — GCP KMS auto-unseal replaces manual unsealinfra/ai-agents/agent-controller/bin/setup.sh deletedapps/blog/blog/markdown/wiki/devops/bootstrap.mdbootstrap.sh + manual Vault stepsbootstrap.sh, complete manual Vault init/unseal/store-secretskubectl get secrets -n ai-agents shows no agent-secrets object (only Helm release secrets)kubectl get events -n ai-agents shows no PSS violations (only quota events from concurrent startup)0 12 * * * (noon UTC) — GCP KMS auto-unseal means Vault is ready before cron fires; no manual re-triggering neededResult: Option A confirmed. JSON-patch annotations are NOT required.
Vault Helm chart v0.29.1 injects containers that already satisfy all PSS
restricted requirements by default. Tested against a vault-pss-spike
namespace with pod-security.kubernetes.io/enforce=restricted.
Observed security context on injected containers (vault-agent-init and vault-agent):
{
"allowPrivilegeEscalation": false,
"capabilities": { "drop": ["ALL"] },
"readOnlyRootFilesystem": true,
"runAsGroup": 1000,
"runAsNonRoot": true,
"runAsUser": 100
}
seccompProfile: RuntimeDefault is not set per-container on injected
containers, but PSS restricted allows inheritance from the pod-level
securityContext.seccompProfile. With seccompProfile: RuntimeDefault
set at the pod level (as shown in the openclaw statefulset), all injected
containers satisfy the requirement.
Required annotation set (no JSON-patch needed):
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "ai-agents"
vault.hashicorp.com/agent-inject-secret-<name>: "<vault-path>"
vault.hashicorp.com/agent-inject-template-<name>: |
{{- with secret "<vault-path>" -}}
export KEY="{{ .Data.data.value }}"
{{- end }}
Pod spec must include pod-level securityContext.seccompProfile.type: RuntimeDefault
to satisfy the seccompProfile inheritance. This is already the pattern from openclaw.
No evidence of agent-json-patch or agent-init-json-patch annotations being required.
The design doc's proposed JSON-patch annotations (path /securityContext/capabilities)
are unnecessary for Vault chart v0.29.1 — remove them from TASK-008 implementation.
Spike also confirmed: Zero PSS events (no warnings), secret injection
working (/vault/secrets/config readable by application container).
Vault is configured to auto-unseal via GCP Cloud KMS instead of Shamir keys. On restart, Vault calls GCP KMS to decrypt its master key — no manual unseal step required.
Resources created:
vault-unseal (region us-east1, project kylepericak)ai-agents (symmetric, software protection)[email protected]projects/kylepericak/roles/vaultUnsealKMS
— permissions: cloudkms.cryptoKeys.get, cloudkms.cryptoKeyVersions.useToEncrypt,
cloudkms.cryptoKeyVersions.useToDecrypt (bound to the specific key only)Credentials file: infra/ai-agents/vault/gcp-credentials.json (gitignored).
Loaded into a K8s Secret gcp-credentials in the vault namespace.
Vault mounts it at /vault/userconfig/gcp-credentials/gcp-credentials.json
via GOOGLE_APPLICATION_CREDENTIALS.
To regenerate gcp-credentials.json on a new machine:
gcloud iam service-accounts keys create \
infra/ai-agents/vault/gcp-credentials.json \
--iam-account=vault-unseal-ai-agents@kylepericak.iam.gserviceaccount.com \
--project=kylepericak
kubectl create secret generic gcp-credentials \
--from-file=gcp-credentials.json=infra/ai-agents/vault/gcp-credentials.json \
--namespace=vault --dry-run=client -o yaml | kubectl apply -f -
~/.vault-init format changed: No longer stores an unseal key.
Only stores VAULT_ROOT_TOKEN and 5 VAULT_RECOVERY_KEY_* entries
(recovery keys protect against KMS key loss — need 3-of-5 to regenerate root token).
Cost: ~$0.06/month for the KMS key. First 20K operations/month free; auto-unseal on restart uses ≪1K operations/month.
Problem encountered: Pod-level runAsUser: 1001 combined with hostPath PVCs broke write
agents. K3s/containerd does NOT apply fsGroup ownership changes to hostPath volumes.
HostPath directories created by kubelet (HostPathDirectoryOrCreate) are root:root 755.
UID 1001 can't create subdirectories, so git clone failed.
Fix 1 — Shared workspace chmod: Applied chmod 1777 to /tmp/agent-workspace and
/tmp/agent-workspace/branches on the Lima VM via rdctl shell. Required for the shared
agent-workspace PVC used by read-only agents. One-time infra step; add to bootstrap
runbook.
Fix 2 — emptyDir for write agents: Switched write agents (journalist, publisher, qa)
from per-branch hostPath PVCs to emptyDir volumes. EmptyDir is created fresh per pod
and is writable by UID 1001. Write agents push all work to GitHub before pod termination,
so ephemeral workspace is safe. Per-branch PV/PVC creation eliminated for write agents;
cleanupBranchPVCs still runs but finds no new branch PVCs to clean up.
Fix 3 — Remove git config global: The non-write agent gitSyncArgs previously ran
git config --global --add safe.directory to suppress git's "unsafe directory" warning.
Running as UID 1001 with HOME=/root, this failed (can't write to /root/.gitconfig). Since
the workspace (/workspace/repo) is owned by UID 1001, the safe.directory check doesn't
apply. Removed the git config step.
Image tag 0.7 includes all three fixes. Controller Deployment pod spec also had
seccompProfile moved from container level to pod level.
JSON Patch add vs replace: Resolved by TASK-002 spike.
JSON-patch annotations are not needed. Vault chart v0.29.1 injected
containers already satisfy PSS restricted. No agent-json-patch or
agent-init-json-patch annotations are required.
Vault Injector + capabilities.drop: Resolved by TASK-002 spike.
capabilities.drop: [ALL] is set by default on injected containers
(vault-agent-init and vault-agent) in Vault chart v0.29.1. No JSON-patch
needed to add it.
GitHub App private key in Vault: The GITHUB_APP_PRIVATE_KEY is
a multi-line PEM file. Vault KV v2 stores it as a string, but the
injection template must preserve newlines. The store-secrets.sh
script needs to handle file-based input (not interactive prompt) for
this key. What blocks: TASK-007 implementation.
Controller's own secrets: The controller Deployment currently reads
GITHUB_APP_PRIVATE_KEY from the K8s Secret to sign JWTs for GitHub
App auth. With Vault injection, the controller must read the PEM from
a file (/vault/secrets/github-key) instead of an env var. This
requires a code change in controller.go to support file-based key
loading. What blocks: TASK-008 implementation.
Vault Agent Injector PSS failure. Resolved. TASK-002 spike
(2026-03-17) confirmed Vault chart v0.29.1 injected containers satisfy
all PSS restricted requirements by default. No JSON-patch workaround
needed. Fallback to Vault Secrets Operator is not required.
Bootstrap order sensitivity. Vault must be running and unsealed
before agent pods can start. If helmfile installs the agent controller
before Vault is ready (despite needs), pods will crash-loop.
Mitigation: helmfile needs + postsync readiness check on Vault
release + manual unseal step documented clearly.
Vault webhook timeout on K3s. K3s runs the API server as a binary, not a pod, which can cause webhook timeouts if the Vault Injector mutating webhook isn't reachable during pod creation. Mitigation: bootstrap script waits for Vault Injector webhook to be ready before proceeding.
Completed Jobs consuming quota. Agent Jobs with
ttlSecondsAfterFinished: 3600 hold quota for up to an hour after
completion. With pods: 8 quota and conservative CPU/memory limits,
a burst of failed Jobs could temporarily exhaust quota. Mitigation:
single-pipeline model limits concurrency; monitoring via Discord
notifications.
Multi-line PEM in Vault template. The GitHub App private key is
a multi-line PEM that must survive Vault template rendering without
newline corruption. If the Go template {{ .Data.data.app_private_key }}
mangles newlines, the controller's JWT signing will fail silently.
Mitigation: test PEM round-trip in TASK-007.
Directory move breaks CI/references. Moving infra/ai-agents/agent-controller/
may break Docker build paths, Go module paths, or references in
CLAUDE.md and wiki pages. Mitigation: TASK-001 includes updating
all references; trivy skip-dirs in CLAUDE.md may need updating.