Deprecated 2026-05-10. The healthcheck CronJob was suspended on pai-m1 because its state-change posts were noise more than signal. The daily diagnostic loop is now
pai-self-improver, which mines OpenObserve for recurring patterns and proposes memory changes for human approval.The healthcheck agent file is still on disk for ad-hoc invocation (
claude --agent healthcheck) when you want a quick pod / ArgoCD / Vault read. The old shell-onlyhealthcheck.shcron lives atinfra/ai-agents/cronjobs/scripts/healthcheck.shif you ever want to flip it back on temporarily.
The health check agent (claude --agent healthcheck) runs a systematic
check of every workload in the pai-m1 K8s cluster. It uses kubectl for
pod/app status, the OpenObserve MCP for log error scanning, and the
Linear MCP for bug filing with deduplication.
Source: .claude/agents/healthcheck.md
claude --agent healthcheck
No arguments needed. The agent runs all checks and outputs a summary table.
| Check | Method | Pass Condition |
|---|---|---|
| Pod health | kubectl get pods -A |
All pods Running/Completed |
| ArgoCD sync | kubectl get applications -n argocd |
All apps Synced + Healthy |
| CronJob health | kubectl get jobs -n ai-agents |
Latest run per CronJob succeeded |
| Log errors | o2_error_summary (1h) |
No error-level logs |
| Vault status | vault status |
sealed=false, initialized=true |
| Log ingestion | o2_list_streams |
k8s_logs stream has recent docs |
When issues are found, the agent:
health-check[health-check] namespace/workload: descriptionThe agent knows what should be running and flags missing workloads:
Currently manual. Plan is to run as a K8s CronJob (like the journalist) on a schedule, posting results to Discord and filing bugs automatically.