A “casual human” voice is less about slang and more about signals of lived authorship: uneven rhythm, selective detail, occasional blunt opinions, and an absence of templated scaffolding (preambles, disclaimers, recap paragraphs). Linguistics research on AI-generated text repeatedly finds that “AI-ish” output clusters around formality/impersonality, predictable discourse structure, and repetitiveness—though results vary by genre, prompt, and model family.
For Anthropic’s Claude 4.6 generation, the most reliable levers are: (a) persona selection via system prompt (explicit role + voice constraints), (b) canonical few-shot examples (3–5 before/after pairs) rather than huge rule lists, and (c) tight context organisation using clear sectioning (XML-style tags) so the model can separate instructions from the editable draft.
In Anthropic’s own prompt guidance, putting longform data near the top and the query at the end can lift quality materially (they cite up to ~30% in tests for complex inputs), which maps well to “Reviewer agent” workflows where the draft is long and the transformation instructions are short.
The style target here is anchored in Kyle Pericak’s blog voice: short, declarative sentences; small paragraphs; blunt framing (“Two things in this post…”); explicit choice rationales (“I picked it because…”); occasional dry humour (“dead internet”); and strong aversion to AI prose (“AI wrote the first draft. It was bad.”).
Model-specific differences that matter for prompt design:
This report prioritises primary sources from the Claude Developer Platform and Anthropic release materials for:
"system" role and supported sampling parametersAI “tell” taxonomy is grounded primarily in peer-reviewed or archival research surveys and detection papers (GLTR/DetectGPT/watermarking), plus recent linguistic meta-analyses of stance/metadiscourse differences.
Kyle-style analysis uses a small but representative sample of posts:
The user’s “Sonet” appears to be a misspelling of Sonnet; the official model IDs are claude-opus-4-6 and claude-sonnet-4-6.
AI “tells” are best treated as a weighted bundle, not a single smoking gun. Research surveying AI-text features finds common signals around formality/impersonality, lexical diversity differences, repetitive patterns, and genre-dependent discourse habits—while also noting that prompt sensitivity is under-addressed in much prior work, meaning the same model can look “more human” under different prompting.
| Tell class | What it looks like in prose (compressed examples) | Detection heuristics you can operationalise | Prompt / workflow countermeasure |
|---|---|---|---|
| Lexical: “corporate neutral” | “robust”, “leverage”, “seamless”, “delve”, “unlock value” | Count “MBA verbs”; compare against a site-specific stoplist | Replace with plain verbs; force concrete nouns |
| Lexical: low idiolect | Few personal favourites (no recurring quirks) | Track author-specific phrases across posts; flag absence | Inject house quirks (“That’s it.” / “No dashboard required.”) |
| Lexical: narrow synonyms | Repeated adjective stacks (“significant”, “notable”, “important”) | High bigram reuse; repeated evaluatives every paragraph | Ban “importance” signalling; prefer “because…” clauses |
| Syntax: smooth, clause-heavy | Long balanced sentences; few fragments | Sentence-length variance low; few 1–4 word fragments | Enforce “8–15 words typical; fragments ok” |
| Syntax: uniform paragraph size | Every paragraph ~3–4 sentences | Paragraph histogram too consistent | Allow 1-sentence paragraph punches |
| Punctuation: overly correct | Perfect commas; no rough edges; rare “…” or “?” | Punctuation entropy low; no purposeful “bad” rhythm | Permit rhetorical questions sparingly |
| Discourse markers overdose | “Additionally”, “Moreover”, “Furthermore” | Frequency of additive connectives | Prefer direct adjacency; cut connectors |
| Metadiscourse inflation | Explicit “this article discusses…” / “in this section…” | High rate of self-referential structure talk | Only keep structure talk when it’s useful |
| Topical: generic safe centre | No sharp opinion; no trade-offs; no “I didn’t do that.” | Stance flattening; hedges without commitment | Require explicit decision + rationale |
| Pragmatic: over-helpful | Answers questions unasked; tutorialises basics | Count “step-by-step” scaffolds not requested | Set “assume competent reader” |
| Pragmatic: faux empathy | “I understand how frustrating…” | Empathy templates near task content | Remove unless story context truly warrants |
| Hedging pattern (genre-dependent) | Either too many “may/might” or oddly few hedges but many attitude markers | Compare hedge/booster counts; watch stance imbalance | Force confident where evidence exists; otherwise hard “unknown/TODO” |
| Repetition: recap paras | “To summarise…” every section | Detect repeating “overall/in summary” | Ban conclusion sections by default |
| Coherence: too perfect | No side-tracks; no small inconsistencies | Very high topical smoothness; no local digressions | Allow one purposeful tangent if present in draft |
| Coherence: shallow causality | Lots of “X is important” without “because” | Ratio of justificatory clauses low | Require “because”/evidence lines |
| Over-clarity / over-formatting | Bullet lists everywhere; “key takeaways” | Too-regular outline + “takeaways” | Use lists only when scannability matters |
| Cultural references: generic + global | Vague “today’s world”; no local detail | Low named-entity specificity | Use draft-provided names only; don’t invent |
| Errors: suspiciously clean | No typos; no informal contractions balance | 0 typos + high polish | Add contractions; vary rhythm; keep technical correctness |
| Hallucinated specifics | Fake numbers, tools, dates, citations | Claim-density > source support | “No new facts”; allow “I don’t know/TODO” |
| Token-level artefacts (statistical) | Unusually “high-likelihood” token choices or watermark signatures | GLTR-style token rank distribution; DetectGPT curvature | If you can: run detectors; otherwise rely on stylistic rewrite pass |
The bolded patterns above align with published syntheses: surveys report increased formality/impersonality markers (e.g., more nouns/determiners/adpositions) and repetition signals, plus under-studied prompt sensitivity. Organisational studies also report AI text skewing more positive sentiment and narrower vocabulary in workplace contexts. Metadiscourse comparisons in academic genres find systematic differences in stance marker use (often overuse of “attitude” markers), highlighting that the exact “tell” can flip by genre.
Claude 4.6 prompting guidance is converging on three ideas: clear roles, structured context, and canonical examples. The Claude docs explicitly recommend XML tags for separating instructions/context/examples, and recommend 3–5 examples for best results. They also advise that with long inputs (20k+ tokens), placing longform data near the top and queries at the end can improve quality significantly (they report up to ~30% in tests).
In the Messages API, the system prompt is a top-level system parameter; there is no “system role” inside messages. Practically:
system.A key gotcha for older “format forcing”: Anthropic’s “Increase output consistency” guide states that prefilling is deprecated and not supported on Opus 4.6 / Sonnet 4.6 (so don’t rely on partial assistant-prefill tricks as your main enforcement mechanism).
Sampling controls (Claude API):
temperature defaults to 1.0 and ranges 0.0–1.0; Anthropic recommends closer to 0 for analytical tasks and closer to 1 for creative/generative tasks.top_p is available and Anthropic recommends altering either temperature or top_p but not both.stop_sequences can be used to halt on strings you define.Thinking controls: the API supports a thinking configuration with types including enabled, disabled, and adaptive. For Opus 4.6, Anthropic recommends adaptive thinking and notes that manual budget mode is deprecated and will be removed in a future release.
The docs refer to an “effort” parameter for adaptive thinking, but the exact field name and enum values are not pinned in the sources sampled here; treat that part as unspecified in your implementation until you confirm in the “Effort” docs page.
Anthropic’s own context engineering guidance warns against stuffing laundry-list rule sets and instead recommends curating diverse, canonical examples (“pictures worth a thousand words”). Given that, the most robust formats for “casual human prose” are:
| Prompt format | Why it helps with “casual human” | Opus 4.6 fit | Sonnet 4.6 fit |
|---|---|---|---|
| Role + concise style contract | Pushes the model into a human author persona (not generic assistant) | Excellent; can handle nuance and longer constraints | Excellent; keep it tighter (less prose about rules) |
| Canonical before/after pairs (3–5) | Demonstrates the rhythm and “what to delete” | Strong; can absorb more examples and subtleties | Strong; examples matter more than abstract rules |
| XML/tagged prompt sections | Prevents instruction bleed into rewritten prose | Strong, especially on long drafts | Strong; reduces misinterpretation |
| Rubric + self-check rewrite loop | Forces the model to notice and remove “tells” | Very strong; Opus can do multi-pass in one call due to output headroom (128k max) | Good; but keep passes limited to avoid verbosity creep |
| Small banned-phrase list | Deletes the highest-signal “AI trope” strings | Useful, but brittle; keep tiny to avoid rule-overfitting | Useful, but keep tiny; prefer examples |
| Long “do/don’t” mega-lists | Can overspecify and create brittle output patterns | Avoid; use examples instead | Avoid; higher risk of falling into templated compliance |
A further practical wrinkle: Claude 4.6 models are described in Anthropic docs as more concise and natural than previous generations, meaning they may skip explicit summaries unless asked. That is actually helpful for “human blog voice” (no forced recap), but it means your Reviewer prompt should only request the minimal meta-output you really want (e.g., a short scorecard + the rewritten post).
Kyle Pericak's blog has a consistent “working engineer” voice across eras, but the 2025+ posts make the meta-position on AI explicit: use AI for drafts/code generation, but rewrite prose to avoid the “dead internet” vibe.
The 2019–2020 posts read like concise field notes: “here’s the problem; here’s the fix; here are the commands.” The opening paragraphs tend to be plain-spoken and mildly opinionated:
Structurally, these posts use clear headings (“The Problem”, “Commands”, “Initial Setup”), short paragraphs, and code blocks without extra narrative padding.
From late 2025 onward, the voice has more explicit meta-commentary and sharper stance about AI prose. In the Mermaid post, Kyle directly states that AI drafted the post and that he rewrote the text because AI writing “makes my eyes glaze over” and he doesn’t want to contribute to a “dead internet.”
The newer posts also show a recurring pattern:
Most importantly for your Reviewer agent spec, Kyle literally embeds a style rule-set for his own writer subagent, including: no em-dashes, 8–15 word sentences typical, fragments allowed, short paragraphs, “no AI writing tells”, and no conclusion sections.
That’s effectively your ground-truth style contract.
These are directly inferable from the sampled posts:
A strong Reviewer agent prompt needs two simultaneous constraints:
Anthropic’s “Reduce hallucinations” guidance maps cleanly here: explicitly allow “I don’t know” (or TODO markers), and ground claims in what’s actually present in the input text rather than inventing.
flowchart TD A[Input: Draft markdown] --> B[Scan: AI tells + Kyle-style gaps] B --> C[Score: rubric with brief notes] C --> D[Rewrite pass: restructure + rephrase] D --> E[Fidelity check: no new facts, preserve code] E --> F[Output: Scorecard + rewritten markdown]
Pair: templated intro → Kyle framing
Pair: connector-heavy prose → direct adjacency
Pair: conclusion section → cut
Use this as the quick “did we nail it” gate after each rewrite:
Below are three system prompts:
You are a Reviewer agent. Your job is to evaluate and rewrite Markdown blog posts so they read like Kyle Pericak’s blog voice: casual working-engineer, blunt, high-signal, not “AI-ish”.
INPUT (from user message)
- <draft> ... </draft> contains the Markdown post to rewrite.
- Optional: <notes> ... </notes> may contain intent, audience, and constraints.
OUTPUT (always exactly two parts, in this order)
1) SCORECARD (max 12 lines): a compact rubric with 1–5 scores + one-line notes.
2) REWRITE: the rewritten Markdown only. No extra commentary.
HARD STYLE RULES (do not break)
- No em-dashes. Use commas or periods.
- Sentences: 8–15 words typical. Fragments are fine for emphasis.
- Paragraphs: 1–5 sentences. No walls of text.
- Headings should be practical (“Why X”, “How it works”, “The stack”, “Cost”, “Wiring it up”).
- Prefer blunt framing over mission statements: “Two things in this post…” / “This is X.” / “I’m doing Y because…”.
- Use contractions naturally (I’ve, I’m, it’s, that’s).
- Delete filler connectives: “Additionally”, “Moreover”, “Furthermore”, “In conclusion”, “It’s important to note”, “Delve”.
- No recap/conclusion section unless the draft explicitly needs it for a reason.
- No “as an AI”, no policy/procedure disclaimers, no self-congratulation.
FACTUAL FIDELITY RULES (must hold)
- Do NOT add new facts, numbers, tool names, prices, dates, benchmarks, or claims that aren’t already in <draft> or <notes>.
- Preserve all code blocks exactly unless there is an obvious typo already present in the draft AND the correction is clearly implied by surrounding text. If unsure, leave it.
- If the draft is missing an important detail, insert: “TODO: <what’s missing>”.
- Keep links intact; do not invent citations.
RUBRIC (1–5 each)
- Human voice plausibility (reads like one engineer wrote it)
- Cadence & paragraphing (short, varied, scannable)
- Signal density (no fluff; reasons are explicit)
- AI-tell absence (no templates, no “bloggy” filler)
- Fidelity (no new facts; code preserved)
TRANSFORMATION METHOD
- First scan and list the top 5 AI tells you see (privately).
- Rewrite by removing templates, tightening sentences, and making the “why” explicit.
- Do a final pass to: remove banned phrases, remove em-dashes, and ensure no new facts were added.
MINI BEFORE/AFTER EXAMPLES (match this direction)
- “In this post we will explore X…” -> “Two things: X and Y. Here’s what I changed.”
- “Additionally, it’s important to note…” -> delete or rewrite as a direct sentence.
- “In conclusion…” -> delete; end on the last useful command/output or a blunt closing line.
Use the Base system prompt rules, plus:
- Do a two-pass rewrite: (1) restructure + rewrite, (2) cold reread and tighten.
- Be more aggressive about deleting redundant explanation and “teaching tone”.
- If the draft contains multiple ideas, split into clearer sections and rename headings for clarity.
- Keep the SCORECARD extremely short (5 lines max), then output the rewritten Markdown.
Use the Base system prompt rules, plus:
- One-pass rewrite only (avoid extra meta).
- Prefer minimal edits that achieve the voice: tighten, cut filler, add blunt framing, fix headings.
- Do not expand the post. If you add text, it must be either (a) clearer “why” using existing facts, or (b) a TODO marker.
- Keep the SCORECARD to exactly 5 rubric lines (no extra notes).