AI/TLDR

What Is Spec-Driven Development? Writing Specs for AI Agents

Understand why a written spec is the highest-leverage artifact in agent coding, what a good spec contains, and how plan-then-build workflows reduce rework.

BEGINNER10 MIN READUPDATED 2026-06-12

In plain English

Spec-driven development (SDD) is a workflow where you write a structured document describing what you want to build before you ask an AI agent to write a single line of code. The spec is the source of truth. The code is just the output the agent derives from it.

Think of it like an architect and a construction crew. The architect doesn't hand a crew a napkin sketch and say "build something nice." They produce blueprints — precise drawings that spell out dimensions, materials, load-bearing walls, and where every outlet goes. The crew follows the blueprints. If the building ends up wrong, you fix the blueprint, not just the wall. AI agents work the same way: hand them a vague instruction and they'll build something plausible that may not be what you wanted. Hand them a precise spec and they have a blueprint they can follow — and re-read every time they get confused.

Spec-driven development is not a new idea. API-first development, OpenAPI specs, and test-driven development (TDD) all put a formal artifact before implementation. What changed in 2025 is that AI coding agents made the gap between a vague request and a correct result dramatically larger — and closed it almost completely when a proper spec was provided. Tools like GitHub Spec Kit, AWS Kiro, and Tessl each shipped structured spec workflows as their core feature, signaling that the industry had settled on a consensus: write the spec first, then let the agent build.

Why it matters

The alternative to SDD has a name: vibe coding. You describe roughly what you want in a chat message, the agent ships some code, you correct it, it ships more code, you correct again. This loop is fast for tiny one-off tasks. It falls apart on anything with more than a few moving parts.

Why does vibe coding fail at scale? Three compounding reasons:

  • Context window drift. As the conversation grows, earlier instructions get pushed out of the model's active context. The agent starts guessing about constraints it read forty messages ago. A spec that gets re-read every turn keeps the goal anchored.
  • Scope creep by hallucination. Without a written definition of done, the agent decides when it's done. It may add features you didn't want, skip edge cases you needed, or invoke a library version that doesn't exist.
  • No shared source of truth. In a team, multiple engineers may all be prompting the same agent about the same feature. Without a spec, each prompt is a fresh interpretation and the resulting code contradicts itself.

A spec eliminates all three problems. It's a persistent document the agent can re-read at any point in the session. It defines the acceptance criteria that mark a task complete. And it gives the whole team a shared, reviewable record of what was decided before a line of code was written.

The productivity math is also compelling. Teams using SDD tools like GitHub Spec Kit have reported that the 20-40% extra token spend from agent re-reading the spec each turn is offset many times over by eliminating rework cycles that would otherwise burn hours of engineer time.

How it works

Every SDD workflow follows the same four-stage pipeline regardless of which tool you use. The names may differ slightly (GitHub Spec Kit calls them Spec, Plan, Tasks, Implement; AWS Kiro uses similar terminology), but the structure is identical.

Stage 1: The spec

The spec is a plain-language document — usually Markdown — that answers four questions: What does this feature do? Who is it for? What does success look like? What is explicitly out of scope? You write it, not the agent, though the agent can draft it for you from a rough description. The crucial step is reviewing it before moving on. The spec is where you catch ambiguity before it costs you.

Stage 2: The plan

The plan is the technical translation of the spec. It names the files that will be created or changed, the data models and their types, the API endpoints and their payloads, the libraries to use, and any architecture constraints (e.g., "use the existing AuthService, don't create a new one"). The plan is where ambiguity in the spec surfaces as a concrete disagreement — which is exactly when you want to catch it.

Stage 3: Tasks

Tasks break the plan into small, sequential, individually-testable steps. Each task should be completable in one agent turn and verifiable with a single git diff review. "Add the User type to types.ts" is a good task. "Build the auth system" is not. The task list is what keeps a long session from losing its place — when the agent completes a task it marks it done, so after a context reset it knows exactly where to resume.

Stage 4: Implement

Only now does the agent write code. Because it has the spec and the task list in its context, it can answer its own questions: "Should I use fetch or axios?" — check the plan. "Is this edge case in scope?" — check the spec. "What's next?" — check the task list. The agent's output is predictable and reviewable because it was designed to be before the session started.

What goes in a good spec

A spec doesn't need to be long. It needs to be precise. A one-page spec that answers the right questions beats a ten-page spec that wanders. Here's the minimum viable structure:

SectionQuestion it answersExample
GoalWhat are we building and why?Add email/password sign-up to the existing auth flow.
Users / contextWho triggers this feature? What do they already have?New visitors who don't yet have a Google account.
RequirementsWhat must be true when done?User can create an account; email must be verified before login.
Acceptance criteriaHow do we know it works?Unit tests pass; signup form renders on /signup; duplicate email returns 409.
Out of scopeWhat are we explicitly not building?Password reset, OAuth providers, admin invite flow.
ConstraintsWhat must the implementation respect?Use the existing UserService; no new npm packages without approval.

The acceptance criteria row is the highest-leverage part. It translates "done" from a feeling into a test. When the agent can run a command and see it pass, it knows it's finished — it doesn't have to guess. This is why SDD and TDD pair so naturally: write the failing test as part of the spec, and the agent's job is to make it pass.

The out-of-scope row is equally important and easy to skip. Agents are helpful by nature — they'll add error handling, logging, and extra validations you didn't ask for. Without an explicit boundary, scope creep is nearly inevitable. Listing what you're not building saves just as much rework as listing what you are.

Tools that implement SDD

By mid-2026, most serious AI coding tools have shipped a structured spec workflow. The three most widely discussed are:

ToolSpec formatKey featureBest for
GitHub Spec KitMarkdown files in your repoOpen-source toolkit; works with any agent including CopilotTeams already on GitHub who want a lightweight, portable format
AWS KiroMarkdown specs + steering filesSteering files inject rules into every session; agent hooks fire on file eventsTeams building on AWS who want deep IDE integration
TesslProprietary spec language"Spec-as-source" — code is generated from spec, not edited directlyProjects where the spec must stay the primary artifact long-term

Martin Fowler's analysis of these three tools distinguishes three levels of SDD ambition. Spec-first means you write a spec before coding but manage the code directly afterward. Spec-anchored means the spec stays authoritative throughout the project and the agent references it on every change. Spec-as-source means the spec is the only thing you edit — code is always regenerated from it, never hand-modified. Tessl is the only tool currently pursuing spec-as-source at production scale.

For most developers, spec-first is the right starting point. It requires no new tooling beyond a text editor and your existing agent. Write a Markdown spec file, drop it in your repo, and tell the agent to read it before starting. That single habit eliminates the majority of vibe-coding failure modes.

DeepLearning.AI also released a course titled Spec-Driven Development with Coding Agents that teaches the full workflow hands-on — a useful next step for anyone who wants a structured introduction.

Going deeper

Once you've adopted basic spec-first habits, there are several directions to push further.

Context engineering and spec design

Specs are a form of context engineering — the broader discipline of deliberately shaping everything an agent reads at inference time. A well-structured spec is one of the highest-ROI context artifacts because it persists across sessions, survives context-window resets, and applies equally to different agents working on the same feature. The spec becomes your team memory, not just an individual prompt.

Automated spec validation

Advanced SDD workflows add tooling that validates specs before the agent starts. This means checking that every acceptance criterion is testable (maps to a runnable test), that out-of-scope items don't appear in the task list, and that the plan references only libraries that actually exist in the project. Tools like Kiro's hooks can wire this validation into file-save events so spec drift is caught the moment it's introduced.

Spec-driven reviews

Code review is faster when the reviewer reads the spec before the diff. Instead of inferring why a change was made from the code itself, the reviewer simply checks whether the implementation matches the spec. This flips the review dynamic: ambiguity in the code becomes an implementation error against a clear spec, not a debate about intent. Some teams check the spec into the same pull request as the code so the review is always spec-first.

Choosing your level of commitment

SDD exists on a spectrum. A minimal adoption — a single Markdown file describing requirements and acceptance criteria — costs five minutes and prevents most agent drift. A full adoption — steering files, automated spec validation, hooks, and multi-agent orchestration — is an engineering investment that pays off on large, long-running projects. Start minimal. Add rigor where you feel the pain.

FAQ

Do I need a special tool to do spec-driven development?

No. The core practice is just writing a structured Markdown document before you start prompting. You can do that in any text editor and paste it into Claude Code, Cursor, or Copilot. Tools like GitHub Spec Kit and AWS Kiro add structure, automation, and IDE integration, but they are enhancements, not requirements.

How long should a spec be?

As short as possible while still answering: what are we building, what does done look like, and what is out of scope? For a small feature, one page is plenty. For a multi-week project, three to five pages is typical. Length isn't the goal — precision is. A vague ten-page document is worse than a crisp one-page spec.

Can the AI agent write the spec for me?

Yes, and many workflows start exactly this way: you give the agent a rough description, it drafts a spec, and you edit and approve it before implementation begins. The key is that a human reviews and approves the spec before the agent starts building. The approval step is where you catch ambiguity — if you skip it, you're just doing vibe coding with extra paperwork.

What is the difference between spec-driven development and test-driven development?

TDD requires you to write failing tests before writing code. SDD requires you to write a spec before writing code. They're highly complementary — acceptance criteria in a spec often translate directly into test cases. Many SDD practitioners write tests as part of the spec stage and let the agent make them pass during implementation.

How does spec-driven development help with context window limits?

AI agents work in a limited context window. As a long conversation grows, earlier instructions get pushed out and the agent starts guessing. A spec file that the agent re-reads at the start of each turn stays anchored to the original goal regardless of how long the session runs. It also lets you safely start a fresh session — hand the new session the spec and it knows exactly where to pick up.

Is spec-driven development only for large projects?

No, but the payoff grows with complexity. For a one-file bug fix, a one-sentence prompt is fine. For anything that touches multiple files, has edge cases, or spans more than one coding session, a brief spec pays for itself in avoided rework. A good rule of thumb: if you'd be annoyed by the agent misunderstanding the scope, write a spec.

Further reading