A founder came to a discovery call last quarter with a clear product vision, a Figma prototype, and a confident timeline. “Six weeks,” he said, before I’d asked a single question. “We’ve built software before. This is just software with AI in it.”
It wasn’t wrong, exactly. AI products are software. But they scope differently. The variables that blow up a timeline are different. The success criteria need a different kind of definition. And there’s one question that almost never comes up in non-AI scoping (“what happens when the model is wrong?”) that can reshape an entire architecture.
We got to a six-week timeline for that project. But we spent the first 20 minutes of the discovery call recalibrating the scope around questions the founder hadn’t thought to think about. This post is about those questions, and the structure I use to get through them in 30 minutes.
Why AI Projects Scope Differently
When you’re scoping a CRUD application, the success criteria are usually binary: does the feature save to the database correctly, does the filter return the right rows, does the export format match the spec? Those criteria are testable, deterministic, and known in advance.
AI systems don’t work that way. They operate probabilistically. A well-built document parser might return the correct answer 94% of the time. An LLM-based compliance checker might agree with a human reviewer 91% of the time. Those are genuinely good results, but they only feel like success if the founder agreed beforehand that 94% was good enough.
If nobody talked about the accuracy threshold before the sprint started, you’ll have that conversation in week three, in the middle of a demo, when the AI makes an embarrassing mistake in front of someone important. That’s the worst place to have it.
So I have three questions in every AI discovery call that I don’t have in non-AI scoping:
- What accuracy threshold makes this product usable?
- What does the product do when the AI is wrong?
- Who owns the data, and can we use it?
Everything else on my checklist exists in regular software scoping too. Those three are specific to AI.
The Checklist
I organize the 30 minutes into three loose phases. Not rigidly. The conversation takes its own shape, but the phases keep me from running out of time on what matters.
Phase 1 (minutes 1–12): Problem and User
What problem is this solving, for whom? I ask this even when I’ve read the brief. I want to hear how the founder describes it in conversation, not in writing. The spoken version is almost always more specific.
What’s the workaround the user is running today? This is the most useful question in phase one. Every AI product replaces something: a manual process, a spreadsheet formula, a contractor, a 45-minute meeting. The workaround tells me what the AI needs to be better than. It also tells me what level of accuracy is actually required.
If a founder’s team is manually reviewing 200 applications per day and spending three minutes on each, an AI that’s right 90% of the time and cuts review time to 45 seconds is still a massive win. If the same founder’s compliance officer personally signs off on every decision and has a zero-tolerance for errors, 90% accuracy might be a no-go. The workaround frames the answer.
What does “working” look like on day 30? I push for something measurable. “The AI handles 80% of tier-1 support tickets without escalation” is measurable. “The support team feels relieved” is not. I’m not trying to be difficult about this. I’m trying to make sure we agree on what done means before anyone writes a line of code.
Phase 2 (minutes 12–25): Data, Accuracy, and Fallback
This is where AI scoping separates from general software scoping.
What data does the product need, and can I see a sample? I don’t commit to timelines without a data sample. If the founder has data in an accessible format (Postgres, S3, an API), I ask them to send a 50-100 row sample before the brief goes out. If the data is in “various places,” I treat that as an unknown and factor it into the estimate.
The data shape changes the build more than the model choice does. I’ve had projects that looked like two-week prototypes turn into six-week builds because the data needed substantial cleaning before it was usable as training or retrieval context. This is also one of the hidden costs I cover in detail in what clients underestimate about AI product costs.
What accuracy threshold makes this product usable? This question sometimes surprises people. I usually frame it as: “If the AI gets 90 out of 100 decisions right, is that good enough to ship?” Most founders haven’t thought about it in those terms. Walking through the answer together surfaces the actual quality bar.
For some products, 85% is fine (the user corrects errors, there’s a review layer, or the downside of being wrong is low). For others, 99% isn’t enough: medical records, legal documents, financial compliance. Understanding which world we’re in determines the architecture, the eval setup, and whether the timeline I’m thinking about is realistic.
What happens when the AI is wrong? This is the fallback question. I ask it directly: “When the model returns a wrong answer or a low-confidence answer, what should the product do?” The options are: show the answer and let the user correct it, route the request to a human, skip the action and flag it for review, or block the request entirely.
There’s no universal right answer. But if nobody’s thought about it, the engineering team will make a default decision that may not match what the business actually needs. I’d rather that decision happen in discovery.
Who owns the data, and what are the privacy constraints? This matters more than people expect. Sending customer data to OpenAI’s API has different legal implications than running a model locally. If the founder’s product handles healthcare records, financial data, or personal information from users in the EU, those constraints shape the architecture. I’m not a lawyer, and I say so, but I need to know whether there are constraints before I suggest an architecture.
Phase 3 (minutes 25–30): Alignment and Next Steps
What’s the first thing you’d want to test with real users? This forces a scope decision in a way that “what should we build?” doesn’t. It shifts the conversation from the full product to the most important thing to validate. Usually that’s one feature or one flow, and that’s what sprint one becomes.
Who else needs to be involved before we start? Budget sign-off, a technical co-founder who hasn’t been on the call, a legal review. I’d rather find out now than in week two. I ask this without judgment. It’s logistics, not a character assessment.
Are there any constraints I haven’t asked about? Open-ended close. Sometimes this produces nothing. Sometimes it produces the most important thing in the conversation: a specific deadline, a dependency on another vendor, a previous failed attempt with another agency. I want it on the table before I write the brief.
After the Call: What Goes Out Within 24 Hours
I wrote about my general estimation process in more detail in my post on why I don’t commit to timelines on new requirements. The brief format for AI projects has a few additions. Once the brief is agreed and the first sprint starts, I usually send a status update within the first 48 hours of the build so founders know what to expect before the first demo.
The standard brief has five sections:
-
The problem in your words. One paragraph restating the problem, the user, and the workaround being replaced. If I’ve understood it wrong, I want the founder to catch it here.
-
Sprint 1 scope. One specific, testable slice of the product. Usually a prototype that validates the riskiest assumption. I name the assumption explicitly.
-
Open questions. Things I need to resolve before the estimate is firm. For AI projects, this almost always includes: data sample review, accuracy threshold confirmation, and fallback behavior decision. I list each one with a resolution path.
-
Timeline range with three numbers. Best case, realistic, and push scenario. One sentence on what determines each. I don’t give a single number. Single numbers are guesses. Ranges with reasoning are estimates.
-
Next steps. Who does what, by when. Usually: founder sends data sample, I review and respond within 24 hours, we have a 15-minute follow-up to close the open questions.
I’ve been using this format long enough to know that the founders who read it carefully and push back on section one are the best clients to work with. They’re engaged, they have opinions, and they’ll hold a reasonable standard throughout the project. The founders who just say “looks good, let’s start” sometimes turn out to be the same ones who ask in week four why the product doesn’t do something we didn’t discuss.
Teresa Torres, who wrote about continuous product discovery, makes a point I’ve internalized: discovery isn’t a phase you complete before building. It’s a habit. The 30-minute scoping call starts a discovery process that continues through every sprint. But it has to start somewhere. Getting these questions answered before sprint one begins means the first two weeks of engineering are spent building the right thing rather than figuring out what the right thing is.
For the AI-specific version of discovery, I’d add one more thing: the jobs-to-be-done framework, which asks what job the user is hiring the product to do, maps naturally onto the workaround question. If you know what the user was doing before, you know what the AI needs to replace, and you know what “better” means. That framing has made more AI projects successful than any particular model choice.
FAQ
How long does a typical AI development project take from discovery to delivery?
From a completed discovery call to a working prototype, we typically deliver in 10-15 business days. The range is wider for production builds: 4-12 weeks depending on complexity, data readiness, and integration surface. The discovery call itself is where that range gets narrower. The three AI-specific questions (accuracy threshold, fallback behavior, data state) are where most of the uncertainty lives.
Do you charge for discovery calls?
No. The discovery call is free. We use it to determine whether we can build what you actually need, not to sell you a proposal. If we can’t build it well, we’ll tell you in the call. If we can, we send a brief within 24 hours and you decide from there.
What should I prepare before a discovery call?
A one-paragraph description of the problem you’re solving and who it’s for. If you have existing data the product would rely on, a small sample (50-100 rows or a few example documents) speeds up the estimate significantly. A Figma mock or a PRD is useful context, but not required. The call will produce clarity regardless of what you bring in.
What happens if my requirements change after we’ve scoped the project?
They will, and that’s fine. We work in sprints, and each sprint ends with a demo and a scope conversation. If something changes between sprints, we assess the impact on the timeline and adjust, but we don’t absorb changes silently. The brief from the discovery call is the reference point, not a contract. When things change, we update the brief and both sides agree on what the change costs in time.
How do you handle AI projects where we don’t have data yet?
Data-free AI projects are possible but they scope differently. If your product needs to train on examples you don’t have yet, the first sprint is usually about generating or collecting that dataset. We’ve done this for several clients: building the data collection mechanism before the AI model itself. It adds 2-4 weeks to the timeline but it’s often the more reliable path than trying to build around incomplete data. We cover this in the open questions section of the brief.
Scoping an AI project and not sure where to start? Book a 30-minute call. We’ll walk through the checklist together and send you a brief by the next day.