Product & Strategy ps-3 30 min

AI Feature Scoping Workshop

Learning Objectives

apply the five-question workshop format to scope an AI feature
construct a minimum viable spec from workshop output
evaluate when to run the workshop versus when to skip it
complete a feature scope for one planned AI capability

Core Concepts

AI Feature Scope A bounded definition of what an AI capability does, who it serves, what constitutes acceptable output, and what the failure conditions are. Without this, "AI feature" is a blank check.

Job to Be Done The specific situation a user is in, the task they are trying to complete, and the outcome they need. AI features that don't map to a concrete job to be done tend to be impressive in demos and unused in production.

At X-company, the job to be done for their project analytics feature was: "A PM wants to understand project risk without building manual reports." Not "users want AI insights." One is actionable. The other is a wish.

Minimum Viable Spec The smallest set of requirements that defines what must be true for the feature to be considered working. For AI features this always includes: input format, output format, acceptance criteria, and at least one explicit failure condition.

A minimum viable spec is not a full PRD. It is the output of alignment, not the input to it.

Output Contract A description of what the AI must produce, written precisely enough that a reviewer can determine in under 60 seconds whether the output is acceptable. Vague output contracts (such as "a helpful summary") produce disagreement in QA and frustration in production.

At X-company, the output contract for their risk query interface was: one paragraph, three risk signals cited with supporting evidence, one recommended action. That sentence eliminates entire categories of bad output before engineering writes a prompt.

Failure Mode Definition An explicit description of output that would be unacceptable. Defining failure is as important as defining success, because AI systems can produce plausible-sounding wrong answers that pass a casual review.

X-company's failure modes: generic summary without project-specific data, hallucinated deadline information, a recommendation that contradicts the current project status. Each of these became an acceptance criterion in the spec.

Key Points

Scope an AI feature by starting with the job to be done, not the technology
Define what good output looks like before you define what the system should do
Explicitly name failure modes: AI features need rejection criteria, not just success criteria
A minimum viable spec is the output of the workshop, not the starting point
The workshop takes 30 minutes when the right people are in the room; it takes three weeks when it happens in Slack

Tools, Prompts, or Templates

AI Feature Scoping Workshop

Without a structured format, feature scoping conversations drift into technology debates or requirements negotiation before the team has agreed on what the feature is for. This workshop format prevents that drift by forcing five questions in sequence. Each question must be answered before the next is opened.

Run this workshop with: the product lead, one engineer, one person who talks to customers regularly (PM, CSM, or founder), and optionally a designer. Keep it to five people or fewer. More than five and it becomes a committee.

Format it as a 30-minute working session. Block the time. Use a shared doc. Capture answers verbatim: clean them up after.

The Five Questions

Q1: What is the job to be done? Write one sentence: who is in what situation, trying to do what, to achieve what outcome. If you can't write it in one sentence, you haven't agreed on the feature yet.

Template:
A [role] who is [situation] wants to [action] so they can [outcome].

X-company example: A project manager who is mid-project wants to understand which projects are at risk so they can take action before a deadline is missed, without building manual reports.

Q2: Where does AI fit? Describe the specific AI behaviour that addresses the job. Be concrete: natural language input, structured output, classification, generation, retrieval. Do not describe the system architecture. Describe the user-facing behaviour.

Template:
The user [inputs something]. The AI [does something specific]. The user receives [output].

X-company example: The user types a natural language query about project health. The AI retrieves project data and returns a risk summary with supporting data points. The user receives a structured summary they can act on without leaving the interface.

Q3: What does good output look like? Describe one example of output you would show to a customer and be proud of. Be specific: length, format, what information is present, what tone.

Template:
Good output is [format], contains [specific content], and [any constraints on length, tone, or structure].

X-company example: Good output is one paragraph. It names three risk signals (such as overdue milestones, budget variance, or scope additions), cites the specific project data behind each signal, and ends with one recommended action the PM can take.

Q4: What does bad output look like? Name at least two failure modes. These become your rejection criteria during QA. If you skip this question, you will discover the failure modes in production.

Template:
Bad output [does something wrong]. Also bad: [second failure mode].

X-company example: Bad output is a generic summary that doesn't reference this project's actual data. Also bad: any output that includes a deadline that isn't in the project record. Also bad: a recommendation that contradicts the current project status (such as suggesting adding resources to a project that is already marked complete).

Q5: What is the minimum viable spec? Based on Q1 through Q4, write the spec in four fields. This is the output of the workshop.

Scope boundary:     [what the feature does and explicitly does not do]
Input format:       [what the user provides]
Output format:      [what the AI returns, and in what structure]
Acceptance criteria: [three to five testable conditions]

X-company example:

Scope boundary:      Query interface only. Read access to project data. No write operations.
                     Does not generate reports, send notifications, or modify project records.

Input format:        Natural language text query, submitted via the existing search bar.

Output format:       Structured JSON rendered as a one-paragraph summary in the UI.
                     JSON fields: risk_signals (array of three objects, each with signal_type
                     and source_data), recommended_action (string).

Acceptance criteria:
  1. Output contains exactly three named risk signals.
  2. Each risk signal cites at least one source data field from the project record.
  3. No deadline or date information appears in the output that is not present in the project record.
  4. Recommended action does not contradict the project's current status field.
  5. Output renders within four seconds of query submission on average load.

When to Run the Workshop vs. When to Skip It

Skipping the workshop is tempting when a feature seems obvious or when the team is already aligned. Run this check before skipping it.

Run the workshop when:

The feature involves generative AI output that a user will act on
More than one team is involved in delivery
The feature touches customer-facing interfaces
Anyone on the team has used the phrase "it depends" when asked what good output looks like

Skip the workshop when:

The feature is internal tooling with a single owner and a single user type
You are running a time-boxed spike or throwaway prototype (do the workshop before you build the real thing)
The feature has already gone through a full scoping process and you are doing a follow-on iteration with no scope change

If you are unsure, run the workshop. It takes 30 minutes. Rework takes weeks.

Actionable Takeaways

Block 30 minutes with your product lead, one engineer, and one customer-facing person before your next AI feature kickoff. Use the five-question format.
Write your output contract before you write a prompt. If you can't describe good output in two sentences, you are not ready to build.
Name two failure modes for every AI feature on your current roadmap. Add them to the spec as rejection criteria.
Use the minimum viable spec template to replace the "AI requirements" section of your next PRD.
Share the bad output examples from Q4 with QA before implementation begins: not after.

Practical Examples

X-company: Full Workshop Walkthrough

X-company's product team ran this workshop for their project analytics query interface. Here is the complete session output.

Attendees: Product lead, senior engineer, customer success lead (who runs weekly calls with law firm clients), and the founder.

Q1: Job to be done A project manager at a professional services firm who is managing five or more active projects wants to understand which projects are at risk without pulling data manually from multiple views, so they can prioritize their attention before a client call or a deadline.

Q2: Where does AI fit? The user types a natural language question into the search bar (for example: "which projects are behind schedule?"). The AI queries the project database and returns a structured risk summary. The user receives a plain-language summary with the supporting data visible below it.

Q3: Good output One paragraph. Three risk signals named explicitly: for example, milestone overdue by seven days, budget at 94% with 40% of work remaining, two scope additions added in the last sprint. Each signal cites the underlying data. The paragraph ends with one action: "Consider a scope review meeting for Harrington & Associates before the end of the week."

Q4: Bad output

A generic paragraph that says "several projects may be at risk" without naming any of them.
Any mention of a completion date that is not in the project record (the team had seen Claude hallucinate a deadline in testing).
A recommendation to add resources to a project already marked complete.

Q5: Minimum viable spec

Scope boundary:      Read-only query interface. No write access. Queries project and
                     milestone records only. Does not surface financial data from
                     the billing module in this version.

Input format:        Natural language text, up to 200 characters, submitted via
                     the existing global search bar component.

Output format:       Structured JSON:
                     {
                       "risk_signals": [
                         { "signal_type": string, "source_data": string }
                       ],
                       "recommended_action": string
                     }
                     Rendered as one paragraph in the UI. Source data shown as
                     collapsed footnotes.

Acceptance criteria:
  1. Output contains exactly three risk_signals objects.
  2. Each risk_signal.source_data references a field present in the project record.
  3. No date or deadline appears in the output that is not present in the project record.
  4. recommended_action does not contradict the project.status field.
  5. P95 response time under five seconds on a dataset of 50 active projects.

The workshop took 28 minutes. The spec went directly into the engineering brief. Engineering estimated the first implementation in the same session because the scope boundary was explicit.

Contrasting Example: What Happens Without the Workshop

The same team had tried to scope an AI meeting summary feature six weeks earlier without a structured process. After two weeks of async Slack discussion, they had:

Three different interpretations of what "summary" meant (action items only, full transcript summary, or executive briefing format)
No agreement on what to do when the AI summary contradicted what someone remembered saying
An engineering estimate based on the most complex interpretation

The feature was put on hold. Running the workshop retroactively on the meeting summary feature took 35 minutes and produced a spec the team shipped in one sprint.

Implementation Workflow

Complete this workflow now, using a real AI feature from your current roadmap or planning backlog.

Choose your feature. Pick one planned AI capability your team has discussed but not formally scoped. If you have multiple candidates, choose the one where there is the most disagreement about what it should do.
Identify your attendees. Name one product person, one engineer, and one person who has direct contact with the users or customers this feature serves. You can run this solo as a planning exercise, but the output will be stronger with at least one other person.
Set a 30-minute timer and open a shared doc.
Answer Q1: Job to be done. Write one sentence using the template: "A [role] who is [situation] wants to [action] so they can [outcome]." Do not move to Q2 until everyone agrees the sentence is accurate.
Answer Q2: Where does AI fit? Write the user-facing behaviour in three sentences: what the user inputs, what the AI does, what the user receives. Avoid architecture and infrastructure language here.
Answer Q3: Good output. Describe one example of output you would show a customer and be proud of. Be specific about format, length, and content. Write it as if you are describing a screenshot.
Answer Q4: Bad output. Name at least two failure modes. For each one, write one sentence that starts with "Bad output is..." or "Also bad:..." These become your QA rejection criteria.
Answer Q5: Minimum viable spec. Fill in the four fields from the template: scope boundary, input format, output format, and acceptance criteria. Write the acceptance criteria as numbered testable conditions. There should be between three and five.
Evaluate whether to proceed. After completing all five questions, ask: does this feature still make sense to build? Is the scope boundary realistic given your current data access and system architecture? Are the acceptance criteria achievable with the model and tooling you have? If the answer to any of these is no or uncertain, note it explicitly before the doc is closed.
File the spec. Attach the Q5 output to your feature ticket, PRD, or engineering brief. Share Q4 (bad output examples) with your QA lead before implementation begins.