Across many engineering organisations, the Zeitgeist has shifted from "docs as supporting artefacts" to "docs as executable, agent-steerable specifications". citeturn7view0turn7view1 This shift is largely a response to agentic coding workflows: models can execute multi-file changes, and the limiting factor becomes the quality, stability, and structure of the plan they are given. citeturn17view0turn17view1
In practitioner field notes, a recurring bottleneck is not raw code generation, but handoffs—requirements → design → implementation—where rationale and constraints get lost ("context goes to die"). citeturn6view4 This is precisely where AI agents tend to underperform: they are strong inside a bounded subproblem, but unreliable at reconstructing missing intent, implicit decisions, or undocumented constraints. citeturn6view4turn7view1
A second driver is traceability and governance. When design decisions live primarily in chat transcripts, teams often lack a durable audit trail for "why we built it this way". citeturn6view4turn7view1 Spec-driven toolchains (for example, workflows that explicitly separate requirements, technical design, and task sequencing) are an attempt to make the artefacts themselves durable, reviewable, and "replayable" by humans and agents. citeturn6view3turn7view1
Publicly available, high-signal templates and examples from Google, Uber, GitLab, Shopify, and Stripe show a family resemblance: they converge on a small set of repeatable "decision forcing functions" (context, goals, proposal, alternatives), then diverge in how much they standardise cross-cutting concerns (security, privacy, performance, rollout, etc.). citeturn8view0turn18view1turn13view0turn19view0
A widely-circulated description of "Google-style" design docs emphasises that context and scope is explicitly not a requirements document, and that the enduring value is in documenting trade-offs and alternatives, not an implementation manual. citeturn8view0 The most common "spine" is:
This same source notes a "mini design doc" can be 1–3 pages for incremental work, while larger projects often land in a ~10–20 page zone; beyond that, splitting the problem is recommended. citeturn8view0
One public Uber template (in an open-source repo) is extremely lightweight: metadata plus Abstract, Motivation, Approaches, and Proposal. citeturn6view0 That minimal template is useful for speed and review throughput.
However, practitioner retrospectives of Uber's historical RFC practice (and how templates evolve at scale) show much more expansive section sets for service changes and mobile work—explicit checklists for SLAs, dependencies, load/performance testing, multi-datacentre concerns, security, rollout, monitoring, support considerations, accessibility, and more. citeturn18view1turn15search12
This is an important pattern when designing for AI agents: some orgs maintain tiered templates (lightweight for small/local changes; heavyweight for high-impact, cross-team, safety-critical work). citeturn15search12turn18view1
GitLab publishes "Architecture Design Documents" as version-controlled artefacts that are intended to be updated over time as understanding evolves. citeturn6view1 That "living doc" stance is directly relevant for agent consumption, because agents will otherwise follow stale decisions.
Two recent, public examples illustrate the section structure GitLab uses in practice:
Notably, the LabKit document explicitly calls out "documentation drift" risks when there is "no machine-readable specification" acting as a contract—an unusually direct statement about why formal schemas matter (and why they help both humans and agents). citeturn13view0
Shopify describes using Engineering RFCs (in large programmes) as an async-friendly mechanism to align quickly on smaller technical design areas, with a template and "rules of engagement" in GitHub; if no one explicitly vetoes by the deadline, the RFC author decides how to proceed. citeturn18view2
The linked template is concise and operational:
[OPEN until <date>][COMPONENT] TITLEFor AI agent consumption, this template is interesting because it bakes in two agent-relevant constructs: (1) explicit deadlines (useful for deciding when a proposal is "final"), and (2) an explicit section on why not to build it (a compact trade-off and rejection rationale). citeturn19view0turn18view2
What is strongly evidenced publicly about Stripe is not a specific internal RFC template, but a company-wide, leadership-driven writing culture: leaders model structured narrative communication; writing is treated as a default for conveying ideas; and internal sample docs are developed to help teams learn structure and style. citeturn18view3
For section structure specifically, public evidence is thinner. One relevant proxy is Stripe's co-maintenance (with OpenAI) of an RFC-style specification for agentic commerce/checkouts. That RFC example is highly structured: explicit Status, Version, Scope, followed by Scope & Goals and other numbered sections. citeturn17view3turn3search23 While this is an external interoperability spec (not an internal backend design doc), it demonstrates a Stripe-adjacent approach to "machine-readable" spec metadata. citeturn17view3turn3search23
Across practitioner guidance, PRDs and design docs are commonly run side-by-side: PRDs articulate what should be built and why, while engineering design docs/RFCs decide how it will be built (including constraints, trade-offs, and rollout). citeturn18view1turn16search32turn16search22
A PRD is commonly defined as aligning stakeholders on purpose, behaviours, and scope; it may include acceptance criteria and constraints, but is not meant to dictate the technical implementation. citeturn16search32turn16search22 Meanwhile, Google-style design docs explicitly warn that the "context and scope" section is not a requirements doc and should stay succinct. citeturn8view0
A practical "minimum viable design doc" (MVD) emerges from the light templates above (Uber's minimal RFC, Shopify's RFC gist, and the core of the Google-style structure): Summary, Context, Goals/Non-goals, Proposal, and Alternatives/trade-offs—with rollout/testing/security added only when material. citeturn6view0turn19view0turn8view0
A "comprehensive RFC" is what tends to happen when cross-cutting concerns become mandatory and accumulated: multi-region/failover, compute/resource requirements, availability strategies, storage/API design, runbooks/operations, i18n, etc. In one retrospective of large-company RFC practice, a "beast" template had grown to ~14 pages before being filled in due to accumulated required sections from "special interests". citeturn18view0 That growth dynamic (helpful for safety, harmful for throughput) is a key design tension for AI-optimised templates. citeturn18view0turn18view1
The core handoff problem is that product intent often arrives with ambiguous "edges" (implicit assumptions, unstated constraints), and the technical plan is forced to reconstruct those edges during implementation. citeturn6view4turn7view1 AI-native workflows try to fix this by splitting a "PRD → design doc" translation into explicit intermediate artefacts with checkpoints.
A good PRD typically carries: target user behaviours, success criteria / acceptance criteria, scope boundaries, and any non-negotiable constraints. citeturn16search32turn16search22 In spec-driven workflows, these become the "requirements" artefact.
What tends to be left behind (or linked rather than copied) is material that is high-value for go/no-go or prioritisation decisions but low-value for implementation planning: market sizing, competitive positioning, long-form discovery notes, and stakeholder narrative that does not constrain behaviour. This is implied by how PRDs are described as aligning purpose/features/behaviour rather than "how to build", while design docs focus on the technical solution and its trade-offs. citeturn16search32turn8view0
Spec Kit formalises the transition in four phases with explicit checkpoints:
This structure makes the PRD-to-design-doc translation explicit: "PRD-like" content is front-loaded into Specify, while "design doc-like" content is captured in Plan, and the remainder becomes executable tasks. citeturn7view1turn6view2
Kiro uses a similar three-file core structure for each "spec":
requirements.md (user stories and acceptance criteria / bug analysis), design.md (architecture, sequence diagrams, implementation considerations, error handling, testing strategy), and tasks.md (discrete implementation tasks). citeturn6view3turn20search4 This explicitly encodes: requirements → design decisions → an ordered task plan. citeturn6view3turn20search8
Copilot Workspace (before the technical preview ended on 30 May 2025) implemented a PRD-to-plan bridge by generating a "spec" as two bullet lists (current state vs desired state), then generating a plan that enumerates every file to create/modify/delete and what to do in each file—both steps editable by the user before code is generated. citeturn7view3
In all three, the design principle is the same: do not ask the agent to infer "how" from a PRD-like statement; instead create a structured intermediate plan artefact that explicitly records constraints, file-level impact, and sequencing. citeturn7view1turn7view3turn6view3
Agent-facing documents are effective when they reduce ambiguity, compress context into durable artefacts, and give the agent a deterministic notion of (a) what to change, (b) in what order, and (c) how to know it is done.
One strongly evidenced pattern is to include a file-level plan as a first-class element—because agents do better when the "where" is explicit.
.cursor/plans/ as durable documentation for future work/agents. citeturn17view0turn17view1This is effectively a "diff-shaped" plan: it creates a map from architectural intent to concrete repository touchpoints. citeturn7view3turn17view1
A second demonstrated pattern is to separate (1) deterministic orchestration from (2) bounded agent execution.
In field notes from QuantumBlack (McKinsey & Company), successful implementations used deterministic orchestration to enforce phase transitions, manage dependencies, and track artefact state, with agent work constrained to executing within a phase. citeturn6view4 The same source describes storing artefact state (draft → in-review → approved → complete) in frontmatter, which the workflow engine reads to decide what is ready vs blocked. citeturn6view4
This matters for design docs intended for agents: without state and gates, agents can "skip steps" (e.g., generating implementation before requirements are stable) or form circular dependencies in their own task selection, especially on larger codebases. citeturn6view4
"Machine-readable" does not necessarily mean formal models everywhere; it often means structured metadata where it matters, and referenceable contracts. Concrete examples:
A practical synthesis is that an AI-optimised design doc benefits from having two layers of structure:
A major failure mode of agentic implementation is "design regression": the agent rediscovers or reintroduces approaches the team already rejected, because the rejection rationale is not captured in a way the agent can reliably apply.
Google-style guidance explicitly frames the design doc's job as documenting the trade-offs that selected one design over others, and calls "Alternatives considered" one of the most important sections because it answers "why not X?" for future readers. citeturn8view0
Other templates embed trade-offs as explicit prompts:
These are already "agent-friendly" because they turn trade-offs into labelled, reviewable objects rather than scattered comments. citeturn19view0turn13view0turn6view0
The evidence from spec-driven workflows suggests the agent benefits when "rejection rationale" is explicit and retrieval-friendly:
A useful pattern (synthesis from the above) is to represent each major decision as a unit with:
If the organisation also uses ADRs, an ADR can act as the stable "final decision record" distilled from a broader RFC discussion. citeturn14search14 (Public evidence from Spotify describes ADRs as capturing a decision, the context, and the consequences, often emerging from RFC discussions.) citeturn14search14
Spec Kit's guidance is explicit that tasks should be "small, reviewable chunks" and implementable and testable in isolation—because this gives the agent a way to validate progress and reduces drift. citeturn7view1 The quickstart also explicitly recommends phased implementation for complex projects to avoid overwhelming agent context. citeturn6view2
Kiro uses a dedicated tasks.md artefact and a task execution interface with status updates, reinforcing the idea that the task plan is not just documentation but an operational control surface. citeturn6view3turn20search1
Agent performance improves when ordering is explicit and dependency constraints are enforced:
tasks.md as sequenced based on dependencies, with tasks mapping back to requirements (traceability). citeturn20search0turn20search8The dependency emphasis is consistent with McKinsey/QuantumBlack's two-layer model: the orchestration layer manages dependencies and enforces phase transitions, while agents execute within those constraints. citeturn6view4
QuantumBlack's field notes describe a two-layer model:
Explicitly, this model emerged because letting agents orchestrate themselves on larger systems led to skipped steps, circular dependencies, and analysis loops. citeturn6view4
For "design docs as agent inputs", the implication is: your design doc should make phase boundaries and prerequisites explicit (e.g., "do not implement until requirements are approved", "do not start migration until data model is final", "Phase 2 depends on Phase 1 schema changes"), ideally in a way a tool can parse. citeturn6view4turn6view3turn6view2
A frequent failure dynamic is template bloat: required sections accumulate over time, and authors/reviewers feel much of the doc becomes superfluous or misplaced (content better suited for code review). citeturn18view0 At Uber scale, practitioner retrospectives describe "noise" (too many RFCs), ambiguity about when an RFC is required, and discoverability problems when docs are scattered in Google Drive or inconsistent locations. citeturn15search12turn18view1
Staleness is another long-standing issue. Google-style guidance explicitly recommends updating the design doc when implementation reality forces design changes, but also notes humans are bad at keeping documents in sync and that designs can accrete "amendments" rather than staying coherent. citeturn8view0
AI agents can amplify staleness and ambiguity because they convert text into code at scale. If the design doc is outdated or incomplete, the agent can systematically reproduce wrong assumptions across many files faster than a human would. This is one reason McKinsey/QuantumBlack calls out loss of rationale and constraints when decisions live in chat windows rather than durable artefacts. citeturn6view4
Agents also change the "under- vs over-specification" boundary:
AI-native tools increasingly incorporate "steerability" and checkpoints designed to reduce these risks:
These are workflow interventions aimed at turning "docs that don't match implementation" into "living specs" with explicit update paths. citeturn20search4turn7view1
Strong, direct evidence exists for:
Thinner evidence exists for:
Synthesising the most repeated and tool-aligned structures, a minimal doc intended to be directly actionable by an agent should include:
A comprehensive RFC (or a tier-2 template) is justified when cross-cutting concerns are first-order risks: multi-region failover, capacity, SLOs, security/privacy constraints, migrations, operational readiness/runbooks. citeturn18view0turn18view1turn8view0 The main practical risk is that "comprehensive" becomes "default", causing template bloat and review fatigue. citeturn18view0turn15search12
A common mitigation is to keep the canonical section headings, but make most operational sections conditional ("include if applicable"), and push deep details into linked appendices or contracts (schemas, API specs, test plans) that can stay version-controlled with code. citeturn8view0turn13view0turn6view4