Link to PRD: Autonomous Security Improvement Loop
The Mac workstation (pai-m1) runs Claude Code in bypass-permissions mode
with three safety hooks (block-destructive.sh, protect-sensitive.sh,
audit-log.sh) defined inline in the Ansible playbook. These hooks have
known detection gaps (e.g., protect-sensitive.sh doesn't block cp,
mv, or vim access to sensitive files), and no process exists to
discover new gaps or implement improvements autonomously.
The technically interesting challenges are: (1) making Claude Code improve its own security controls without breaking its own autonomy, (2) using an adversarial verification pattern where a separate Claude Code instance tries to bypass each new security measure, and (3) a cost-gated wrapper that parses JSONL session logs to enforce a daily spend cap before each iteration.
Goals:
claude -p) every
30 minutes for iterative security improvement~/.claude/projects/ JSONL logs in the wrapper to
enforce $150/day spend cap before each iterationwiki/design-docs/security-improvement-log.mdNon-Goals:
claude CLI)
graph TD
subgraph "Wrapper Script (long-running bash)"
START[Start Loop] --> COST{Cost Gate
Parse JSONL logs}
COST -->|Over $150/day| EXIT_COST[Log + Discord + Exit]
COST -->|Under budget| LOCK{Check Lock File}
LOCK -->|Active < 5 min| WAIT[Wait 60s + Retry]
LOCK -->|Active 5-60 min| SKIP[Skip This Cycle]
LOCK -->|Active > 60 min| KILL[Kill + Take Over]
LOCK -->|Stale or none| ACQUIRE[Acquire Lock]
WAIT -->|Still active| SKIP
WAIT -->|Released| ACQUIRE
KILL --> ACQUIRE
SKIP --> SLEEP[Sleep 30 min]
ACQUIRE --> INVOKE[Invoke claude -p
Improvement Iteration]
INVOKE --> STATUS{Read Status File}
STATUS -->|improved| VERIFY[Invoke claude -p
Adversarial Verification]
STATUS -->|done| EXIT_DONE[Discord Summary + Exit]
STATUS -->|missing/corrupt| ERROR[Log Error
Discord Warning]
ERROR --> SLEEP
VERIFY --> COMMIT{Verification
Passed?}
COMMIT -->|pass| DISCORD[Post to Discord
#status-updates]
COMMIT -->|fail| REVERT[git restore .
Log Failure]
REVERT --> DISCORD
DISCORD --> SLEEP
SLEEP --> COST
end
subgraph "Improvement Iteration (Claude Code)"
I1[Read wiki improvement log] --> I2[Assess security posture]
I2 --> I3[Identify highest-impact gap]
I3 --> I4[Edit Ansible playbook / hook files]
I4 --> I5[Run ansible-playbook --check]
I5 --> I6[Write status file + log entry]
end
subgraph "Adversarial Verification (Claude Code)"
V1[Read status file
What was changed?] --> V2[Design bypass attempt]
V2 --> V3[Test that bypass is blocked]
V3 --> V4[Verify Claude Code
can still operate]
V4 --> V5[Write verification result]
end
INVOKE --> I1
VERIFY --> V1
apps/agent-loops/macbook-security-loop/loop.sh~/.claude/projects/ JSONL logs for cost calculation/tmp/sec-loop.lock/tmp/sec-loop-status.jsonapps/blog/exports.sh for secretsThe wrapper is a while true loop with sleep 1800 between iterations.
It never invokes the Anthropic API directly — all AI work goes through
claude -p which bills to Claude Code tokens.
loop.sh (bash function)*.jsonl files under ~/.claude/projects/The cost gate replicates a simplified version of the cc-usage MCP's
pricing logic in bash. It fetches the LiteLLM pricing JSON once per
loop invocation (cached to /tmp/litellm-pricing.json with 1-hour
TTL), then sums today's token costs across all session logs.
This is intentionally a rough estimate — the cc-usage MCP does precise tiered pricing, but the wrapper only needs "over or under $150" accuracy. A simpler approach: sum output tokens * worst-case model rate as an upper-bound estimate.
apps/agent-loops/macbook-security-loop/prompt.md-p prompt argument/tmp/sec-loop-status.json as output signalThe prompt instructs Claude Code to:
ansible-playbook --check to validate syntaximproved / done)The prompt explicitly forbids:
apps/agent-loops/macbook-security-loop/verify-prompt.md/tmp/sec-loop-status.json to understand what changed/tmp/sec-loop-verify.jsonThe adversarial verifier:
This is a red-team pattern: the verifier acts as an attacker trying to circumvent the new control. If the bypass succeeds, the change is reverted.
/tmp/sec-loop.locknoclobber shell option
((set -o noclobber; echo "$$:$(date +%s)" > "$lockfile"))trap on EXIT/INT/TERM/HUP removes the lock filekill -0PID:START_TIME;
validator compares stored start time against ps -p $PID -o lstart=Stale lock detection: if the PID in the lock file is not running
(kill -0 $PID fails), the lock is stale. Clean it up and proceed.
If the PID is alive, the wrapper applies a deterministic
duration-based heuristic (note: the PRD suggests the Claude Code
instance decides, but making this a simple bash heuristic avoids
spawning an AI invocation just for coordination):
apps/blog/blog/markdown/wiki/design-docs/security-improvement-log.mdEach entry contains:
loop.sh (bash function)DISCORD_BOT_TOKEN and status-updates channel ID from
exports.shThree notification types:
infra/mac-setup/hooks/block-destructive.sh,
infra/mac-setup/hooks/protect-sensitive.sh,
infra/mac-setup/hooks/audit-log.shcontent: | to src: hooks/<name>.sh/tmp/sec-loop-status.json){
"action": "improved",
"finding": "protect-sensitive.sh does not block cp/mv to sensitive files",
"change": "Added cp, mv, rsync to blocked commands in protect-sensitive.sh",
"file_changed": "infra/mac-setup/hooks/protect-sensitive.sh",
"iteration": 3
}
Or for self-termination:
{
"action": "done",
"reason": "No material security improvements remain. All known gaps addressed.",
"total_iterations": 7,
"total_improvements": 5
}
/tmp/sec-loop-verify.json){
"result": "pass",
"bypass_attempted": "Tried to cp ~/.ssh/id_ed25519 to /tmp/stolen-key",
"bypass_blocked": true,
"autonomy_check": "Successfully read CLAUDE.md, ran echo test, edited temp file",
"autonomy_intact": true
}
Usage: ./loop.sh [--dry-run]
Prerequisites:
- claude CLI on PATH
- source apps/blog/exports.sh (for DISCORD_BOT_TOKEN, channel IDs)
Env vars (from exports.sh):
- DISCORD_BOT_TOKEN — Discord bot authentication
- DISCORD_STATUS_CHANNEL_ID — #status-updates channel ID
Options:
--dry-run Run one iteration without committing or posting to Discord
Signals:
SIGTERM/SIGINT — Clean shutdown, remove lock file, post Discord exit msg
# Improvement iteration
claude -p "$(cat prompt.md)" \
--model sonnet \
--output-format json \
--max-turns 30 \
--max-budget-usd 5.00 \
--no-session-persistence \
2>&1 | tee "/tmp/sec-loop-iter-${ITERATION}.log"
# Adversarial verification
claude -p "$(cat verify-prompt.md)" \
--model sonnet \
--output-format json \
--max-turns 15 \
--max-budget-usd 2.00 \
--no-session-persistence \
2>&1 | tee "/tmp/sec-loop-verify-${ITERATION}.log"
Key CLI flags:
--max-budget-usd — per-invocation hard cap (defense-in-depth with
the daily cost gate; prevents a single runaway iteration)--no-session-persistence — ephemeral invocations that don't
accumulate session state on disk--output-format json — the ResultMessage includes
total_cost_usd which the wrapper can accumulate for more accurate
daily cost tracking than JSONL parsing aloneNote: piping stdin > ~7k characters to claude -p produces empty
output (known bug). The prompt files must stay under this limit, or
the wrapper should pass a file path in a short prompt instead.
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| Bash script | Zero dependencies, runs anywhere macOS, matches existing playbook/hook pattern, tmux-friendly | Limited JSON parsing (needs jq), harder to maintain complex logic | Chosen — simplicity matches the task; jq handles JSON needs |
| Node.js with Claude Code SDK | Rich SDK, structured output, better error handling | Adds Node runtime dependency for wrapper, SDK uses API tokens not CC tokens | Rejected — SDK would bill to API tokens, not Claude Code tokens |
| Python script | Better JSON/string handling, rich standard library | Extra runtime dependency, doesn't match existing patterns | Rejected — adds complexity without meaningful benefit |
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| Wrapper parses JSONL directly | No extra invocation cost, runs before Claude Code starts, fast | Reimplements cc-usage logic in bash (simplified) | Chosen — zero-cost check, prevents wasting tokens on over-budget iterations |
| Claude Code checks via MCP | Accurate pricing, uses existing cc-usage MCP | Costs tokens for every check, iteration starts before budget verified | Rejected — defeats purpose of cost control |
| External cost monitoring service | Most accurate, independent verification | Doesn't exist, would need to be built | Rejected — over-engineering for v1 |
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| Adversarial separate invocation | Independent verification, tests from attacker's perspective, catches cases where the change broke the model itself | Extra cost per iteration (~$0.50-1.00) | Chosen — strongest verification; cost is acceptable given 30-min intervals |
| Same-iteration self-check | No extra cost, immediate feedback | If the change broke Claude Code, it can't verify itself; fox guarding henhouse | Rejected — insufficient independence |
| Wrapper-only check (ansible --check) | Zero AI cost, fast | Can't test behavioral security (only syntax), no adversarial thinking | Rejected — too shallow for meaningful verification |
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| Extract hooks to standalone files, playbook copies them | Safer editing (no YAML corruption risk), git diff is cleaner, standard pattern | Requires upfront refactor of playbook | Chosen — eliminates the highest-risk failure mode (YAML corruption breaking the entire playbook) |
| Edit playbook inline content blocks directly | No refactor needed, current pattern | YAML-sensitive, one bad indent breaks entire playbook | Rejected — too risky for autonomous editing |
| Loop only adds new hook scripts, never edits existing | Safest, additive-only | Can't fix existing gaps in current hooks | Rejected — too limiting; existing hooks have known gaps |
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| Status file (JSON) | Structured, extensible, carries context for Discord messages and verification | Extra file I/O | Chosen — provides rich context for the wrapper's decision-making |
| Exit code convention | Simple, no file I/O | Limited information (just a number), can't carry context | Rejected — wrapper needs to know what was changed for Discord and verification |
| Parse stdout for marker | No extra files | Fragile, depends on output format, --output-format json changes stdout structure |
Rejected — too brittle |
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| Static template | Simple wrapper, model handles context discovery, no prompt construction logic | Model spends tokens re-reading the improvement log each iteration | Chosen — simplicity wins; reading the log is cheap relative to the improvement work |
| Dynamic with injected context | More efficient iterations, model starts with full context | Complex wrapper, brittle if log format changes, harder to debug | Rejected — premature optimization |
| Action | File | Rationale |
|---|---|---|
| CREATE | apps/agent-loops/macbook-security-loop/loop.sh |
Main wrapper script: loop, cost gate, lock file, Discord, orchestration |
| CREATE | apps/agent-loops/macbook-security-loop/prompt.md |
Static prompt for improvement iterations |
| CREATE | apps/agent-loops/macbook-security-loop/verify-prompt.md |
Static prompt for adversarial verification |
| CREATE | infra/mac-setup/hooks/block-destructive.sh |
Extracted from playbook inline content |
| CREATE | infra/mac-setup/hooks/protect-sensitive.sh |
Extracted from playbook inline content |
| CREATE | infra/mac-setup/hooks/audit-log.sh |
Extracted from playbook inline content |
| MODIFY | infra/mac-setup/playbook.yml |
Replace inline content: with src: for hook scripts; add DISCORD_STATUS_CHANNEL_ID to exports.sh.sample |
| CREATE | apps/blog/blog/markdown/wiki/design-docs/security-improvement-log.md |
Wiki improvement log (initially empty table) |
| MODIFY | apps/blog/blog/markdown/wiki/design-docs/index.md |
Add link to this design doc |
| MODIFY | apps/blog/exports.sh.sample |
Add DISCORD_STATUS_CHANNEL_ID variable |
Dependency-ordered tasks. [P] = parallelizable (can run concurrently
with other [P] tasks at the same dependency level).
infra/mac-setup/hooks/block-destructive.sh,
infra/mac-setup/hooks/protect-sensitive.sh,
infra/mac-setup/hooks/audit-log.sh,
infra/mac-setup/playbook.ymlinfra/mac-setup/hooks/ as
standalone executable filesansible.builtin.copy: src= instead of
content: | for all three hooksansible-playbook --check playbook.yml passes with no errors[P]apps/agent-loops/macbook-security-loop/loop.sh/tmp/sec-loop.lock on starttrap cleans up lock file on EXIT, INT, and TERM--dry-run flag runs one iteration and exits without
committing or posting to Discordshellcheck[P]apps/agent-loops/macbook-security-loop/loop.sh~/.claude/projects/ JSONL files for today's
date entriesget_total_spend(days=1) output[P]apps/agent-loops/macbook-security-loop/loop.sh,
apps/blog/exports.sh.samplehttps://discord.com/api/v10/channels/{id}/messages)DISCORD_STATUS_CHANNEL_ID added to exports.sh.sample--dry-run modeDISCORD_STATUS_CHANNEL_ID or DISCORD_BOT_TOKEN is unsetapps/agent-loops/macbook-security-loop/prompt.mdansible-playbook --check validation/tmp/sec-loop-status.json with
structured outcome"action": "done" when no material improvements remain)apps/agent-loops/macbook-security-loop/verify-prompt.md/tmp/sec-loop-status.json to understand the
change/tmp/sec-loop-verify.json with structured
resultapps/agent-loops/macbook-security-loop/loop.sh"action": "done", post final Discord
summary and exit loopgit restore .) and log the failureclaude -p exits non-zero or status file is
missing/corrupt, log the error, post Discord warning,
and continue to next iteration (do not exit the loop)apps/blog/blog/markdown/wiki/design-docs/security-improvement-log.md,
apps/blog/blog/markdown/wiki/design-docs/index.md./loop.sh --dry-run completes one full iteration:
cost check, improvement invocation, status file written,
adversarial verification, verification result writtenshellcheck loop.sh passes with no warningsCost gate simplified. Instead of fetching LiteLLM pricing JSON,
the cost gate uses a hardcoded worst-case rate ($75/MTok — Opus
output pricing) applied to all output tokens. This always
overestimates, which is the safe direction for a budget gate. The
--max-budget-usd per-invocation flag provides the precise
per-call cap.
Discord split into two channels. Milestones (iteration complete,
self-termination, budget exceeded) go to #status-updates via
DISCORD_STATUS_CHANNEL_ID. Operational noise (missing status file,
verification failures, unexpected actions) goes to #log via
DISCORD_LOG_CHANNEL_ID. Added DISCORD_LOG_CHANNEL_ID to
exports.sh.sample.
Restore-and-continue on agent failure. If the improvement agent
doesn't write a status file (e.g., hit budget cap mid-run) or writes
an unexpected action, the wrapper runs git restore ., sleeps 30
minutes, and continues to the next iteration instead of halting the
loop. Relies on the 30-minute timer for retry.
shellcheck added to Ansible. Added shellcheck to
homebrew_packages in the playbook so it's available on the
workstation for future hook script linting.
Cost gate accuracy. The wrapper's JSONL parsing is a simplified estimate. If the estimate is consistently off by more than 20% vs. the cc-usage MCP's calculation, consider calling Claude Code once just for a cost check (cheap Haiku invocation). What blocks: need to compare estimates during TASK-003 testing.
Status file race condition. The improvement iteration writes the status file, then the wrapper reads it. If Claude Code crashes mid-write, the file may be corrupt. Mitigation: write to a temp file and atomic-rename. What blocks: TASK-005 prompt needs to include atomic write instructions.
Discord #status-updates channel ID. Kyle created the channel.
The channel ID needs to be added to exports.sh as
DISCORD_STATUS_CHANNEL_ID. What blocks: Kyle provides the ID.
Hook extraction testing. After extracting hooks to standalone
files, need to verify the playbook's src: path resolution works
correctly (relative to playbook location vs. absolute). What
blocks: TASK-001 implementation.
Ansible playbook corruption. The loop edits the playbook and
hook files. A bad edit could break the entire Mac restore process.
Mitigation: ansible-playbook --check before every commit;
adversarial verification tests the change; frequent commits mean
git revert is always available.
Autonomy regression. A security improvement (e.g., blocking a new command pattern in hooks) could break a workflow Claude Code needs. Mitigation: adversarial verification includes an autonomy smoke test (read, write, bash). The prompt explicitly forbids autonomy-reducing changes.
Cost overrun within single iteration. The $150 check happens before each iteration, but a single iteration (improvement + verification) could cost $2-5. If spending is at $148 when the check passes, it could reach $153 by iteration end. Mitigation: Sonnet is cheap; $5 overshoot on a $150 budget is 3% — acceptable.
Diminishing returns. The model may not recognize when improvements become trivial and continue making low-value changes. Mitigation: self-termination in the prompt; Kyle reviews the wiki log and Discord updates; can manually kill the loop.
JSONL log format changes. Claude Code may change its session log format across updates, breaking the cost gate. Mitigation: the cost gate uses a conservative parser that skips unparseable lines; format changes degrade to "unknown cost" which triggers a warning but doesn't block execution.
Lock file stale state. If the wrapper is killed with kill -9
(skipping traps) or the machine loses power, the lock file
persists. Mitigation: PID validation — check if the PID in the
lock file is still running before treating it as locked.