A founder told me in January that his product needed an AI chatbot because his two biggest competitors had launched one. I asked what the chatbot would do. He said it would answer product questions. I asked which questions users were currently struggling to find answers for. He paused. Then: “That’s honestly a good question. I don’t know.”
The discovery call lasted another forty minutes. We never talked about a chatbot again. We ended up building a semantic search layer over his existing documentation. Not AI in the headline sense, but it solved the actual problem, shipped in six days, and his support team stopped fielding the same twelve questions every week.
I push back on “AI features” regularly. Not because founders are wrong to ask. They’re usually responding to real pressure. But the pressure is coming from the wrong direction.
The Three Requests I’ve Learned to Scrutinize
“We need an AI chatbot on our homepage” comes in four or five times a quarter. Always from founders who’ve seen a competitor launch one. Usually preceded by: “Our investors asked if we have AI in the product.”
The problem isn’t that chatbots are bad. We’ve built them for clients with real use cases: high-volume support teams, repetitive query patterns, measurable deflection targets. The problem is “our competitor has one” isn’t a use case. It’s comparative anxiety wearing a product spec.
“We should add AI-generated content” is the second. The framing varies: scale the blog, personalize onboarding emails, auto-summarize reports. Each is a real capability we’ve shipped. But when I ask which part of the content workflow is actually breaking, the answer is usually that it isn’t. The founder wants to feel like they’re moving on AI before a board meeting, not because their team is blocked.
“We need a recommendation engine” is the third. This one comes from founders who’ve had a product investor reference Netflix or Spotify. Recommendation systems are hard, expensive, and depend entirely on data density that most seed-stage products don’t have. When I ask how many product interactions per user per week the product currently drives, and the number is under twenty, the engine would be recommending noise.
None of these requests are stupid. Every one points to a real ambition. But ambition isn’t a feature spec.
Why I Don’t Just Say Yes
I could build the chatbot. I could ship the content pipeline. I could stand up a recommendation API. Our team has done all three more than once.
But if I take a sprint budget and deliver something that doesn’t move a number the founder can point to in six weeks, I’ve done two things wrong. I’ve spent their money on something that doesn’t produce evidence, and I’ve taught them that AI builds are the kind of thing that feel impressive and don’t generate proof. That’s exactly the reputation I’m trying to help them avoid with their board.
MIT Sloan’s research on enterprise AI adoption consistently finds that organizations struggle most not with building AI capabilities, but with connecting those capabilities to outcomes they can measure. The feature isn’t the hard part. Defining what success looks like before you build is.
When a feature doesn’t measure anything, doesn’t convert anyone, and doesn’t cut a real cost, it’s a demo. Demos have their place, usually in a prototype conversation. Not in a sprint that comes out of a production budget.
How I Push Back Without Derailing the Relationship
The push-back isn’t adversarial. It’s a shift in question.
When a founder says “we need an AI chatbot,” I say: “Walk me through a week without it. Which interaction fails? Which user leaves? Which support ticket doesn’t get resolved?” If the answer describes a real, recurring failure mode, we’re still in chatbot territory and I’ll spec it. If the answer is “we just don’t have one,” we’re somewhere else.
The frame I use is outcomes, not capabilities. Every AI feature request I’ve ended up supporting traces back to a specific metric. Gartner’s research on AI deployment points to misalignment between technical capabilities and measurable business outcomes as the leading reason AI initiatives stall after initial rollout. That tracks with what I see on calls.
For more on this, read our guide on Why I Don’t Use the Word ‘AI’ in Discovery Calls Anymore. “What changes for your users in week one if this feature is live?” is my standard question. I ask it early, sometimes before I’ve heard the rest of the description. If the founder can answer with a concrete number or behavior change, we’re working with a real use case. If they can’t, I say: “Let’s run a two-day discovery sprint before we write the spec. I want to understand the failure mode first.”
Most founders agree. The ones who don’t usually want to ship something before a demo. I’ll sometimes build a prototype for that, but it comes with an explicit conversation: this is a demo artifact, not a production feature, and we’ll revisit the spec before it goes into the product.
What Usually Happens in the Discovery Sprint
Two days. Both of us looking at the same data.
Day one is a workflow audit. I ask the founder or their team lead to walk me through the specific process we’d be replacing or augmenting. I’m looking for the friction point: the manual step that takes longer than it should, the error that happens more than twice a week, the user action the product currently makes impossible. I take notes. I ask why three times for each piece of friction.
By the end of day one, I usually have a clearer picture than the founder does of what the actual problem is. Not always, but often. The chatbot request sometimes turns into a search problem. The recommendation engine request sometimes turns into better filtering. The content pipeline request sometimes turns into a template with one-click generation that takes four days to build and costs under $5,000.
Day two is me coming back with a reformulated spec. I’ll present two versions: the feature as originally requested, with honest cost and timeline estimates, and the reformulated version with the same. Then I ask which one solves the problem we identified on day one.
About 60% of the time, the reformulated spec wins. About 30% of the time, the original request was right, and the discovery sprint just confirmed it with better framing. About 10% of the time, we find out the product isn’t ready for an AI feature yet. The data is too thin, the workflow is too undefined, or the team doesn’t have capacity to integrate and monitor an AI layer. In that case, I say it directly. We’ll book a follow-up in 90 days.
The Features That Earn Their Sprint Slot
Every AI feature that made it through this process and into production shares a common structure.
It measures something. Not “user satisfaction” in the abstract. A specific number: support ticket volume, time to first response, form completion rate, document processing time, call-scoring accuracy. If you can’t write the measurement in a single sentence, the feature isn’t ready for sprint planning.
It maps to an existing manual process. Someone on the team is currently doing, by hand, what the feature would automate. That person can validate whether the output is correct. Their weekly hours are the cost estimate for “what we’re replacing.”
It has a failure mode the team understands. When the AI gets it wrong, what happens? Is it a bad search result the user retries? A flagged call transcript a human reviews? The features that work have a degraded-mode answer before they go live. The features that don’t work are usually the ones where the failure mode was “we’ll figure it out.”
This framework comes from running AI development projects for founders across a range of build sizes and timelines. It’s held across every build that shipped clean and every one that ran over. The same pattern shows up in how we evaluate AI vendors ourselves: the first question is always “what metric does this move?”
The Founders Who Get This Right
The founders who push back on their own requests before I have to are the ones I find most productive to work with.
One of them, a B2B SaaS founder in workflow automation, walked into our first call and said: “I think I want an AI layer, but I’m not sure if I want the feature or just the investor signal.” We spent forty minutes mapping the actual gaps in his product. We ended up building a structured output layer that processed 800 support tickets a week, cut first-response time from 4 hours to 18 minutes, and cost $12,000 to build. He had a concrete number in his next board update, and a live demo showing it working.
That’s what real ai project management looks like in practice. Not building what sounds impressive. Building what measures true.
If you’re trying to figure out whether an AI feature request is real or marketing, book a 30-minute call. I’ll run the discovery questions with you and tell you honestly which category yours falls into.
FAQ
How do I know if an AI feature request is worth pursuing?
Ask one question: can someone on your team write down the metric this feature is supposed to move? If yes, it’s worth scoping. If no, run a two-day discovery sprint first to identify the real failure mode before committing a sprint budget. A feature spec written from a vague “we need AI” brief will almost always need revisions after discovery anyway.
What does an AI development services discovery sprint cost?
At Kalvium Labs, the two-day discovery sprint is included in our scoping process before we quote any build. It’s not a separate billable line. If you’re working with another provider, expect a scoping fee of $500 to $2,000 depending on complexity. That cost is almost always recovered in a more accurate estimate, which means fewer scope-change conversations during the sprint.
How long does it take to build a real AI feature once the spec is clear?
Most AI features with a defined scope, clean input data, and a measurable output ship in two to four weeks. A working prototype with defined inputs can be ready in 72 hours. The variable that slows most builds is data prep. If the data the feature needs exists but isn’t clean or accessible, add one to two weeks before your timeline estimate starts.
When should I say no to an AI feature entirely?
When the data doesn’t exist yet, when the manual process it would automate hasn’t been defined clearly enough to replicate, or when the team doesn’t have capacity to integrate and monitor an AI layer. None of these are permanent blocks. They’re timing issues. Build the data first. Define the process manually first. Free up PM bandwidth first. The AI feature will be cheaper and more reliable when those preconditions are met.
What’s the difference between a prototype and a production AI feature?
A prototype demonstrates capability on a fixed dataset. It isn’t monitored, doesn’t handle edge cases, and doesn’t integrate into your product’s auth and deployment stack. A production feature handles live data, logs outputs, and has a defined process for when the model gets something wrong. The sprint budget difference is usually 3 to 5 times. For investor demos, a prototype is usually enough. For features your users rely on daily, you need the production version.