Spec-Driven Development sdd-2 25 min

The Spec Generate Review Loop

Learning Objectives

run a complete spec-generate-review cycle using a provided spec
evaluate generated output against acceptance criteria
identify the four common loop failure modes and how to catch them
apply one iteration to close a gap found during review

Core Concepts

The Loop Structure

The spec-generate-review loop has four stages that execute in order and repeat until the output meets its acceptance criteria.

Stage 1: Spec The agent receives a complete, agent-readable spec as its input. The spec defines the feature, its acceptance criteria, its constraints, and its scope boundaries. Nothing is added verbally during generation.

Stage 2: Generate The agent produces output based strictly on the spec. This could be code, a schema, a test suite, a workflow definition, or documentation. The generation is treated as a black box: the spec is the only input that should matter.

Stage 3: Review A reviewer (human, automated, or both) evaluates the output against the acceptance criteria in the spec. The review is not an opinion: it is a structured check against defined conditions. Every gap found is recorded with a specific reference back to the spec criterion it violates.

Stage 4: Iterate If the review finds gaps, a targeted correction is written: either a clarification added to the spec, a revised constraint, or a supplementary prompt that addresses only the failing criteria. The loop runs again from Stage 2 with the updated input.

The loop exits when all acceptance criteria pass or when the team explicitly decides a remaining gap is out of scope for this iteration.

Acceptance Criteria as the Review Standard

Acceptance criteria are the only valid standard for review. An output is not "good" or "bad" in the abstract: it either satisfies a specific criterion or it does not.

Each criterion in the spec maps directly to a review question. For the scheduling platform's booking flow:

Spec criterion	Review question
Time slots display in the client's local timezone	Does the rendered slot use the client's timezone offset?
Double-booking is prevented at the database level	Is there a unique constraint or transaction lock on slot reservations?
Confirmation email is sent within 30 seconds of booking	Is the email trigger async with an observable timeout?
Admin can cancel any booking with a reason field	Does the cancellation endpoint accept and persist a reason string?

If the review question cannot be answered yes or no from the output, the criterion was not specific enough. That is a spec problem, not a generation problem, and the fix belongs in Stage 1.

The Correction Type Matters

Not all gaps require the same fix. Applying the wrong correction wastes an iteration.

Gap type	Correct fix	Wrong fix
Missing feature: the output omits something the spec required	Add a constraint or example to the spec that makes the requirement unambiguous	Re-run generation without changing anything
Scope creep: the output includes something the spec did not ask for	Add an explicit exclusion to the spec	Accept the extra output and move on
Ambiguous interpretation: the agent chose one valid reading of an unclear criterion	Rewrite the criterion to remove the ambiguity	Write a new prompt from scratch
Structural mismatch: the output format does not match the expected shape	Add a format constraint or a worked example to the spec	Ask the agent to "fix the format" verbally

Loop Depth

Most features converge in one to three iterations. If you are past three iterations on the same criterion without convergence, stop and diagnose. Repeated failure almost always traces back to one of the four loop failure modes described in the next section.

Four Common Loop Failure Modes

Failure mode 1: Reviewing against unstated criteria The reviewer flags something the spec never required. The agent cannot have failed a criterion that was not written down. Resolution: decide whether the criterion belongs in the spec. If yes, add it and run another iteration. If no, remove the flag.

Failure mode 2: Accepting partial output without recording the gap The reviewer notices a gap, judges it minor, and ships anyway. The gap is not recorded. It reappears in a different form two iterations later and is harder to trace. Resolution: every gap gets logged, even if the decision is to defer it. No silent acceptances.

Failure mode 3: Changing the spec mid-generation Someone adds a requirement verbally, in a comment, or in a follow-up message while the agent is still generating. The spec and the output are now misaligned before review even starts. Resolution: freeze the spec before generation begins. All changes go through Stage 1.

Failure mode 4: Running review without the spec present The reviewer evaluates the output from memory or general intuition. Without the spec in front of them, subjective judgment replaces criterion-based review. Some real gaps are missed; some non-issues are flagged. Resolution: the spec is present and open during every review. Review questions are derived directly from its acceptance criteria.

Key Points

The loop has four stages: Spec, Generate, Review, Iterate. It exits when acceptance criteria pass.
Acceptance criteria are the only valid review standard. Each criterion maps to a yes/no review question.
Different gap types require different fixes. Match the correction to the root cause.
Most features converge in one to three iterations. Repeated failure signals a spec problem, not a generation problem.
The four failure modes are: reviewing against unstated criteria, silent acceptance of gaps, mid-generation spec changes, and reviewing without the spec present.

Tools, Prompts, or Templates

Booking Flow Spec

Without a concrete spec to generate against, the loop has no entry point. This template gives the scheduling platform team a production-ready spec for their core booking flow. It includes acceptance criteria structured so each one maps directly to a review question.

Use this spec as the input to Stage 2 in the implementation workflow below.

FEATURE SPEC: Booking Flow
Feature: Client-facing time slot booking
Status: Ready for generation
Version: 1.0

OVERVIEW
Clients visit a public booking page for a host. The page displays available
time slots for the next 14 days. The client selects a slot, enters their
name and email, and submits the booking. The system reserves the slot,
sends a confirmation email to the client, and notifies the host.

SCOPE
In: slot display, slot reservation, confirmation email, host notification
Out: calendar sync, payment, recurring bookings, waitlisting

ACCEPTANCE CRITERIA
AC-1: Available slots are displayed in the client's local timezone, derived
      from the browser's timezone offset at page load.
AC-2: A slot becomes unavailable to other clients the moment a booking
      is submitted (not when it is confirmed). Enforced at the database
      level with a row-level lock or unique constraint on (host_id, slot_start).
AC-3: The client receives a confirmation email within 30 seconds of a
      successful booking. The email contains: host name, slot date and time
      in the client's timezone, a calendar file attachment (.ics), and a
      cancellation link.
AC-4: The host receives a notification (email or in-app, based on their
      preference setting) within 60 seconds of a new booking.
AC-5: If slot reservation fails (conflict or error), the client sees an
      inline error message and the form remains populated. No email is sent.
AC-6: The booking form requires: client name (max 100 chars), client email
      (valid format). No other fields are required.
AC-7: The public booking page loads in under 2 seconds on a standard
      broadband connection (tested with no cold-start penalty).

CONSTRAINTS
- Backend: Node.js with a PostgreSQL database
- Email: sent via the existing SendGrid integration
- No authentication required for clients
- Slot duration is fixed per host (set in the host's profile, not per booking)
- The booking page URL format is: /book/{host-slug}

EXAMPLES
Example 1: A client in Tokyo (UTC+9) views a host's page. The host has a
slot at 09:00 UTC. The client sees "6:00 PM" as the slot time.

Example 2: Two clients attempt to book the same slot simultaneously. One
succeeds. The other sees: "This slot was just taken. Please choose another
time." The form stays open with their details intact.

OUT OF SCOPE (EXPLICIT)
- Calendar sync (Google Calendar, Outlook): not in this version
- Payments or deposits: not in this version
- Booking modification by the client after confirmation: not in this version

Review Checklist

Reviewing generated output from memory produces inconsistent results. This checklist exists to make the review stage criterion-driven: every row links a spec requirement to a concrete check, so the reviewer arrives at a pass or fail without judgment calls.

Adapt the checklist for each spec by replacing the AC references with your own criteria.

REVIEW CHECKLIST: Booking Flow
Spec version: 1.0
Reviewer: _______________
Review date: _______________
Output reviewed: _______________  (e.g. "booking API endpoint + frontend slot component")

CRITERION CHECKS

AC-1 (Timezone display)
[ ] Slot times rendered using client timezone offset, not UTC or server time
[ ] Timezone derived at page load (not hardcoded)
Pass / Fail / Deferred: ___

AC-2 (Conflict prevention)
[ ] Reservation happens on submit, not on confirmation
[ ] Database constraint present: unique or row-level lock on (host_id, slot_start)
[ ] No window between submit and lock where double-booking is possible
Pass / Fail / Deferred: ___

AC-3 (Confirmation email)
[ ] Email triggered on successful booking
[ ] Timing: within 30 seconds (async trigger present, timeout observable)
[ ] Email contains: host name, slot datetime in client timezone, .ics attachment,
    cancellation link
Pass / Fail / Deferred: ___

AC-4 (Host notification)
[ ] Notification triggered on new booking
[ ] Delivery method respects host preference setting
[ ] Timing: within 60 seconds
Pass / Fail / Deferred: ___

AC-5 (Failure handling)
[ ] Conflict or error shows inline message (not a full-page error)
[ ] Form stays populated after failure
[ ] No email sent on failed booking
Pass / Fail / Deferred: ___

AC-6 (Form validation)
[ ] Name field: required, max 100 chars
[ ] Email field: required, valid format enforced
[ ] No extra required fields present
Pass / Fail / Deferred: ___

AC-7 (Page load)
[ ] Load time target documented or tested
[ ] No unnecessary blocking resources on the booking page
Pass / Fail / Deferred: ___

SCOPE CHECK
[ ] No calendar sync code included
[ ] No payment logic included
[ ] No booking modification endpoint included

OVERALL RESULT
All criteria pass: YES / NO
Gaps to address: (list AC references)
Recommendation: Ship / Iterate / Escalate

Actionable Takeaways

Before your next AI-assisted development task, write down three to five acceptance criteria before you generate anything. Run the review against those criteria only, not general impressions.
For your scheduling platform work or any current feature in progress: pull up your last piece of AI-generated output and check it against the Booking Flow Review Checklist format. Identify which criteria you were actually checking and which you were checking from instinct.
Share the four failure modes with your team. Ask each person to identify which one they most often encounter. The answer will tell you where your team's loop breaks down.
Log every gap you find during review, even the ones you decide to defer. A gap log after three iterations will show you whether the same criterion keeps failing (spec problem) or whether gaps are distributed (normal convergence).
If you are managing a team that uses AI generation: make spec-present review a non-negotiable step. The single rule "the spec must be open during review" prevents failure modes 1 and 4 with zero additional overhead.

Practical Examples

Example 1: First iteration on the booking flow (engineer perspective)

A backend engineer receives the Booking Flow Spec and runs generation for the slot reservation endpoint. The agent produces a Node.js POST handler at /bookings with a basic check against existing bookings using a SELECT query before inserting.

Review against AC-2 finds a gap: the SELECT and INSERT are not wrapped in a transaction with a row-level lock. Two concurrent requests could both pass the SELECT check before either INSERT executes. The double-booking window exists.

The engineer records the gap: "AC-2: Race condition between SELECT and INSERT. No transaction or lock present."

The fix is a spec clarification: the constraint section is updated to require BEGIN TRANSACTION ... SELECT ... FOR UPDATE ... INSERT ... COMMIT. The loop runs again. The second generation produces a handler with an explicit transaction and a FOR UPDATE lock on the conflicting row. AC-2 passes.

One iteration. One criterion. One targeted fix.

Example 2: Scope creep caught in review (product manager perspective)

A product manager is reviewing generated frontend output for the booking page. The agent has built the slot display, the booking form, and a confirmation screen. It has also added a "View my bookings" link for clients to see their booking history.

This is not in scope. The spec explicitly excludes booking modification, and a booking history view was never defined.

The reviewer does not accept the extra feature. They add an explicit out-of-scope line to the spec: "Client booking history view: not in this version." They flag the output as failing the scope check and run another iteration. The second generation omits the history link.

The spec is now more explicit than before, which protects future iterations from the same drift.

Example 3: Three-iteration convergence on email content (technical director perspective)

A technical director is overseeing generation of the confirmation email for AC-3. The first iteration produces an email with host name and slot time but no .ics attachment and no cancellation link.

Gap log after iteration 1: "AC-3: Missing .ics attachment. Missing cancellation link."

The spec is updated to include a format example for the email and an explicit note that the .ics file must be generated server-side using the ical library.

Iteration 2 produces an email with an .ics attachment and a cancellation link. The link format is /cancel?token= with a query parameter. Review checks whether the token is signed or just the booking ID. The spec did not specify. The reviewer decides this is a security requirement that belongs in the spec and adds it: "Cancellation link must use a signed token (HMAC-SHA256) with a 7-day expiry."

Iteration 3 produces a signed cancellation link. AC-3 passes in full.

Three iterations, each addressing one specific gap, each fix written into the spec before re-running.

Discussion Prompt Think about the last feature your team shipped using AI-generated output. Which of the four failure modes, if any, occurred during review? What would a single spec change have caught?

Implementation Workflow

Complete this workflow using the Booking Flow Spec and Review Checklist provided above.

Read the spec in full before generating anything. Open the Booking Flow Spec. Read every acceptance criterion. For each one, write a single yes/no review question you will ask of the generated output. This is your personal review standard for this iteration.
Select a scope for this iteration. Choose one part of the booking flow to generate: the slot reservation endpoint, the booking form frontend, or the confirmation email trigger. Do not try to generate the full feature in one pass. Write the scope at the top of a blank document: "Generating: [your chosen scope]."
Run generation against the spec. Submit the Booking Flow Spec to your AI agent with a clear instruction: generate the component you selected in step 2, using the spec as the complete requirement. Do not add verbal requirements. Do not modify the spec after you submit it.
Open the Review Checklist alongside the generated output. Do not review from memory. The spec and the checklist are both open. Work through every criterion that applies to the component you generated. For each criterion, mark Pass, Fail, or Deferred. Write a one-line note for every Fail or Deferred entry.