The health check agent (claude --agent healthcheck) runs a systematic
check of every workload in the pai-m1 K8s cluster. It uses kubectl for
pod/app status, the OpenObserve MCP for log error scanning, and the
Linear MCP for bug filing with deduplication.
Source: .claude/agents/healthcheck.md
claude --agent healthcheck
No arguments needed. The agent runs all checks and outputs a summary table.
| Check | Method | Pass Condition |
|---|---|---|
| Pod health | kubectl get pods -A |
All pods Running/Completed |
| ArgoCD sync | kubectl get applications -n argocd |
All apps Synced + Healthy |
| CronJob health | kubectl get jobs -n ai-agents |
Latest run per CronJob succeeded |
| Log errors | o2_error_summary (1h) |
No error-level logs |
| Vault status | vault status |
sealed=false, initialized=true |
| Log ingestion | o2_list_streams |
k8s_logs stream has recent docs |
When issues are found, the agent:
health-check[health-check] namespace/workload: descriptionThe agent knows what should be running and flags missing workloads:
Currently manual. Plan is to run as a K8s CronJob (like the journalist) on a schedule, posting results to Discord and filing bugs automatically.