The autonomous security improvement loop ran from 2026-03-18 to 2026-03-22 (~3 days, ~60 iterations, 68 commits). Expected: comprehensive hardening of the Mac workstation. Actual: a handful of real findings buried in significant wasted effort, one major component removed as architecturally incompatible, and several findings re-discovered 8+ times without lasting resolution.
The loop was capable and fast but aimed at the wrong target.
68 commits in 3 days reflects genuine execution velocity. The problem was what those commits
were doing: iterating on protect-sensitive.sh (ultimately removed), re-fixing the same
.mcp.json permission repeatedly, and improving loop mechanics — not finding and fixing real
security gaps.
The underlying cause: LLM intuition is a weak security discovery mechanism. The loop discovered what the model happened to notice by reading config files and thinking about what was missing. This is biased, unscored, and blind to anything the model doesn't know to look for. A security scanner (Lynis, CIS benchmark) produces a complete, prioritized, scored finding list in minutes. The loop's job should be executing remediations from that list, not improvising the list itself.
Most wasted effort traces to a single chain that compounded across ~30 iterations:
1. Sudo removed (correct finding, ~iteration 25) — but this broke Ansible's become: true tasks permanently for the rest of the run.
2. Deployment fell back to ad-hoc shell bypasses — chflags nouchg tricks, heredoc
writes, manual cp — because the authoritative deployment mechanism no longer worked.
3. Ad-hoc fixes updated live state but not playbook source.
4. Next iteration found source still wrong and "re-fixed" it — .mcp.json mode 0644
discovered and fixed 8+ times in one evening. Each fix landed in the deployed file; the
playbook source regressed back each time.
5. Diversity enforcement couldn't catch it — the loop saw different filenames and tool calls each time, so the semantic duplicate wasn't detected.
Fix: Add a deployment verification gate. After each fix, run ansible-playbook --check
and confirm zero drift. Finding is not closed until the playbook enforces it durably.
protect-sensitive.sh consumed multiple days across the entire run — fixing fnmatch
argument order, adding Grep/Glob matchers, closing shell quoting bypasses, managing
source/deployed divergence. It was removed on day 4 as "fundamentally incompatible with
agent autonomy."
This failure mode is architectural, not implementation. The agent needs to bypass the hook to deploy the hook. Every bypass pattern written to deploy it is a pattern that could be exploited to circumvent it. The controls that are robust on an AI workstation are OS-level: filesystem permissions, firewall rules, sudo policy. These don't require the agent's cooperation to enforce.
For future loops: evaluate the architecture of a proposed control in iteration 1. If it requires the agent to bypass itself to deploy itself, that's a disqualifying flaw.
The loop had no P0/P1/P2 concept. AutomaticallyInstallMacOSUpdates missing from playbook
(zero live risk — the live machine already had it set) competed for iteration budget with
firewall completely off (live exposure, immediately exploitable).
"Playbook drift" — settings applied live but not encoded in the playbook — dominated later iterations. These are real but low urgency: the machine is already protected; only a rebuild would be affected. They should be a separate, batched pass, not interleaved with live-gap finding.
14+ of 68 commits were on the loop itself. The Python rewrite was correct — bash was genuinely untestable under the pace of iteration — but it consumed most of day 2. A more stable initial implementation would have recovered this time for actual findings.
Real findings with real impact. Application Firewall completely off, passwordless sudo
grant, world-readable API keys (exports.sh 0644, .mcp.json 0644), screen lock
unconfigured. These were worth finding and are fixed.
Fully autonomous for 3 days. No human intervention required beyond Discord steering. The operator directives channel proved the steering mechanism works.
The audit log. Net-new capability that survived the loop. The JSONL record is what made this RCA possible.
Python rewrite quality. 35 unit tests, clean structure. The loop is now maintainable for future runs.
| # | Action | Status | Why |
|---|---|---|---|
| 1 | Lead with a scanner. Run Lynis or equivalent before any LLM-driven iteration. Loop works the scored finding list top-down. | done — Lynis installed (68/100, findings actioned below); NIST mSCP installed for macOS-native CIS Level 1 auditing | Replaces intuition-based discovery with ground truth |
| 2 | Add deployment verification gate. After each fix: ansible-playbook --check. Finding not closed until playbook enforces it with zero drift. |
open | Breaks the re-finding loop caused by ad-hoc deployments |
| 3 | Separate playbook-drift pass. Tag and defer "live but not in playbook" findings. Run them as a batched cleanup, not interleaved with live-gap work. | open | Eliminates low-urgency noise from the core loop |
| 4 | Tier findings before acting. P0 = live exploitable now, P1 = exploitable on rebuild, P2 = defense-in-depth. Address in order. | open | Prevents low-impact work crowding out high-impact work |
| 5 | Architecture review in iteration 1. If a proposed control requires the agent to bypass itself to deploy, reject it before iterating. | open | Avoids the protect-sensitive.sh sunk cost |
| 6 | Cap meta-work. Loop mechanics improvements are scheduled, not reactive. One meta-commit per 10 finding commits maximum. | open | Protects finding time from loop self-improvement |
| Tool | Source | Scope | Run with |
|---|---|---|---|
| Lynis | brew install lynis |
Linux-first, non-privileged checks | lynis audit system |
| rkhunter | brew install rkhunter |
Rootkit detection | rkhunter --check |
| NIST mSCP | Cloned to ~/tools/macos_security (tahoe branch) |
macOS CIS Level 1, macOS-native checks | sudo ~/tools/macos_security/build/cis_lvl1/cis_lvl1_compliance.sh |
mSCP requires root and generates a compliance script from the CIS Level 1 baseline. Re-generate after OS upgrades:
cd ~/tools/macos_security && .venv/bin/python3 scripts/generate_guidance.py baselines/cis_lvl1.yaml -s
Hardening index: 68/100. Run non-privileged (6 tests skipped). No warnings, 14 suggestions.
| Finding | Lynis ID | Severity | Action taken |
|---|---|---|---|
/etc/ssh/sshd_config is mode 644, should be 600 |
FILE-7524 | P1 | Added playbook task — applies on next -K run |
/var/spool/uucp is mode 755, should be 750 |
HOME-9304 | P2 | Added playbook task — applies on next -K run |
| No malware/rootkit scanner installed | HRDN-7230 | P1 | rkhunter installed and added to playbook |
| Compilers world-executable | HRDN-7222 | N/A | Not actionable — /usr/bin/clang etc. are SIP-protected |
| PAM password strength not configured | AUTH-9262 | N/A | Not applicable — macOS uses its own auth stack, not PAM |
| Symlinked mount points | FILE-6310 | N/A | Expected macOS behavior (/home, /tmp, /var are symlinks) |
| Apache mod_evasive/modsecurity missing | HTTP-6640/6643 | N/A | Apache not running as a production server |
| DNS domain name not configured | NAME-4028/4404 | N/A | Not relevant for this workstation |
| No package audit tool | PKGS-7398 | N/A | Trivy already covers this |