
I had Claude credits left in my week and wanted to secure my agent-dediced macbook. It ran on my Mac M1 for a few days while I was on vacation. I made it kind of fancy with Discord support and remote management through tailscale, it was really fun getting notified every now and then that another commit was pushed to the open PR.
It used a DIY mvp version of a Ralph loop. I am the supply chain! ...The bots tell me they won't draw Simpsons characters for me and I'm not about to break out an image editor, so we get this footless child instead.
Look at the commits in this PR, over 80! A common thing about AI is you get all these impressive stats that way oversell the value that they represent. These 80 commits are cool but if I'd done it by hands it would have been like... 4, maybe 5.
As another win, I used my PRD and DD agents from the prior post to design this out and that worked wonderfully.
Ralph is an autonomous agent pattern that runs Claude Code in a loop until a task list is empty. The task list here was dynamic, Claude kept looking for new ways to improve laptop security without hurting agent autonomy. Later it crunched through security findings from compliance scanners. The agent read each finding, fixed it through Ansible, verified the fix, and moved to the next one. I steered it from Discord a bit.
The machine is a Mac M1 running as an AI workstation. This is "Claude's laptop", and I factory reset it to remove anything Claude shouldn't have. I segmented it from my network too, just in case.
Claude Code runs in bypass-permissions mode with unrestricted tool access. The entire machine config is managed by an Ansible playbook, so any change the agent makes is reproducible on a fresh install.
Before this project, I'd done some security work setting up pre-commit hooks for semgrep and gitleaks, a security toolkit Docker image, and hook scripts that blocked known-bad commands.
This loop was okay, I guess. The idea of the loop is really exciting. I can't wait to get a loop of, like "work on anything in Linear" or something similarly broad that is able to decompose stuff properly.
Anyways, for this project, the focus was limited to laptop security. A Python script spawns Claude Code every few minutes to keep in the 5-hour token window. We never really went above 50%. I had a custom token use mcp counting things up too, but the math is fuzzy, especially since Claude was offering double quota at night that week.
Each iteration:
stateDiagram-v2
[*] --> Ideate
Ideate --> PickFinding
PickFinding --> FixInPlaybook
FixInPlaybook --> Deploy
Deploy --> Verify
Verify --> Commit: pass
Verify --> PickFinding: fail (retry)
Commit --> Sleep
Sleep --> ReadLog
The loop had those cost/token controls, a lock file so only one instance ran at a time, and a timeout per phase to kill things when they get stuck. It had an adversarial subagent loop, I love those, to make sure that the changes actually worked, helped security, and did not hurt agent autonomy. It pushed to GitHub and pinged me on Discord after each verified commit, so I could review from my phone.
"Claude, find me some ways to make my laptop secure and implement the best one through Ansible".
It found a few real, if basic, issues. Things like:
sudoers.d file granted passwordless sudo to the agent userexports.sh and .mcp.json were world-readable (mode 0644)But 60 iterations later, most of the time had gone to things that didn't matter or couldn't work.
It turns out you need to be proactive about preventing loops.
During this test, Claude spent multiple days iterating on a hook script called protect-sensitive.sh.
It'd move on then just keep coming back for it. The hook itself just tried to prevent the agent from reading sensitive files.
I had given some instructions to not get stuck in loops like that, but I guess it didn't take.
After 68 commits in the week, I had Claude run an RCA.
LLM intuition is a weak security discovery mechanism. The loop should execute against a scanner's scored finding list, not improvise the list itself.
The result Looked really good, 68 commits in 3 days is cool stat, but the actual value delivered was pretty meh. You need to be really intentional and good about goal-setting.
The RCA asked if it could just use a scanner like I did with the SCA stuff and that immediately resulted in way more fixes and a much better loop.
The RCA recommended three scanners:
| Tool | Scope | Why |
|---|---|---|
| Lynis | General system audit | Broad, scored, Linux-first but works on macOS |
| rkhunter | Rootkit detection | Catches things Lynis doesn't check |
| mSCP | macOS CIS Level 1 | NIST-maintained, macOS-native compliance checks |
All three install through the Ansible playbook. A LaunchDaemon
runs them daily at 06:00 and writes results to
/var/log/security-scans/.
It was admittedly kind of a pain in the ass to help Claude escalate perms to root over and over during the mSCP thing, some of those tests need root to run and Claude couldn't trigger them itself.
Other than the fun of having it work while I was away, here's what Claude tells me it implemented to secure my laptop from this. Worth noting that this was all done through Ansible, so when I factory reset my laptop again (I will), I get this back too:
sshd_config permissions (0600) so it can't be read by other usersSo overall, pretty meh. It was fun though and I think if done well and let to run perpetually, it could be a neat approach to having an agent actively defend my system.