The founder called on a Thursday evening. We’d shipped a working product in six weeks, which he was genuinely happy with. The budget was a different conversation.
“I thought we had a fixed price,” he said.
We did. For the scope we’d agreed on in week one. The scope in week six was not the same scope.
I’ve had that call more times than I’d like. After 20+ AI builds in the last two years, I’ve stopped being surprised by budget overruns. What surprises me now is when they don’t happen. The conditions that cause them are consistent, predictable, and almost always traceable back to the same five problems.
The Integration Tax Nobody Budgets For
Every AI product we build needs to connect to something the client already has: a CRM, a call recording platform, a data warehouse, an existing web app with a twelve-year-old API.
That integration work almost never takes the time we estimate at the start. It takes 3-5x longer. Not because engineers are slow. Because existing systems have quirks that only appear when you’re actually connecting to them.
One project: we budgeted four days to integrate with a client’s telephony provider. The provider’s documentation said they had webhook support. They did. But the webhook payload format had changed in a 2022 update, and the documentation still described the 2019 format. Finding that discrepancy cost two developers three days before we could write a single line of AI code.
Another project: a client’s internal data warehouse didn’t support date-range filtering via API. Every integration call we needed during testing required a human on the client’s side to run a manual CSV export first. We hadn’t priced in that dependency. It added a week.
What we do differently now: integration gets its own line item in every estimate, with a minimum buffer of 15% of the total scope. If I can’t get clear answers about existing systems in the discovery call, the buffer goes higher, not lower.
The questions I ask now that I didn’t used to: What does the API documentation look like? When was it last updated? Who owns the systems we’re connecting to?
Accuracy Iteration Is Unbounded Without a Target
When a founder asks “how long will it take to build a model that can do X?”, there’s a truthful answer and a useful answer. The truthful answer: it depends on the accuracy you need and the quality of your training data. The useful answer requires knowing both before the project starts.
Most AI projects don’t know both at the start.
Here’s what typically happens. A team builds a classifier or a RAG retriever. It works at 72% accuracy in the first iteration. The client wanted 90%. Another round: 81%. Then 86%. Then 88%. Then three rounds trying to close the last two points, because marginal returns shrink as you approach the target.
That curve from 72% to 90% might take two weeks. Or six. Without a defined accuracy target agreed on before engineering starts, there’s no milestone that lets you call the model “done.” The iteration phase becomes an open-ended expense.
What we do differently: accuracy thresholds go into the Sprint 0 document. Not “we’ll aim for high accuracy.” Specific numbers: “the compliance checker must agree with the human reviewer at least 88% of the time on a 200-sample test set.” That test set is defined and locked. If the model hits 88%, the milestone is met. If we’re at 84% after three rounds, we have a real conversation: invest more to close the gap, or ship at 84%?
That conversation is uncomfortable in Sprint 0. It’s much worse in week seven, when the original estimate is exhausted.
What Demos Do to Budgets
AI demos are unusually powerful compared to traditional software demos. When a client sees a working model classify their actual sales calls with 89% accuracy, they don’t think “great, that’s what we agreed on.” They think: “if it can do that, can it also…”
I wrote about the mechanics of this in Scope Creep in AI Projects: How I Manage It. The short version for budget purposes: every demo is a scope risk if there’s no written scope lock.
The pattern is consistent. A client starts with one feature. Sprint two demo works well. Sprint three works better. By sprint four, the backlog has three new ideas that weren’t in the original scope, and the client expects them in the original budget.
This isn’t malicious. Clients respond to what they’re seeing in real time, and assume the original estimate had headroom for their enthusiasm. It almost never does.
What the written scope lock does specifically: it makes the original agreement concrete. When a client proposes something new, I can say “that’s not in scope for this project, but let me scope it for the next sprint.” The document makes the boundary visible without making it personal. You can read how we build that document during the discovery call process here.
Infrastructure and Compliance Surprises
Building an AI product in a clean development environment is very different from building one that has to meet real-world constraints: HIPAA, GDPR, SOC 2, a corporate security review, or infrastructure requirements from a larger partner the client forgot to mention.
We ran a project for an enterprise client where, four weeks in, they mentioned almost in passing that all data processing had to happen within their existing AWS VPC. We’d been building against a managed vector database service. Migrating to a self-hosted solution running inside their VPC added two weeks we hadn’t budgeted.
Another project: a healthcare client needed patient data to stay within India’s geographic boundaries. The third-party transcription API we’d selected didn’t have data residency options. Finding an alternative with comparable accuracy and compliant hosting took a week we hadn’t priced.
Neither client was being difficult. They just didn’t know, at the start, that these constraints were relevant. And we hadn’t asked specifically enough.
What we ask in every discovery call now: Are there data residency requirements? What are the information security policies we need to work within? Is there an IT security review process for new vendors? Are there existing infrastructure commitments that affect where we deploy?
These aren’t AI-specific questions. They’re standard software delivery questions. But AI projects add surface area because they touch data more extensively and bring new third-party services into the stack.
The Estimation Approach We Had to Rebuild
Early in our work, we estimated AI projects the same way we’d estimate standard software: based on features and complexity, with a margin for testing and QA. That worked fine for the software we’d built before. It doesn’t work for AI.
Here’s the PM version of why: the “done” test for an AI feature is different from any feature test I’d seen before. You can’t declare a model finished the way you declare a form submission finished. It’s finished when it’s accurate enough, and “accurate enough” requires rounds of evaluation that nobody has a line item for. That gap between “technically working” and “production-ready” is where most of the budget goes.
What our estimates look like now (Atlassian’s agile estimation guide covers the mechanics well for traditional software; the AI-specific additions are the iteration and integration rows):
| Component | Old allocation | New allocation |
|---|---|---|
| Development | 80% | 55% |
| AI model iteration | 0% (buried in dev) | 20% |
| Integration buffer | 5% | 15% |
| Documentation + handoff | 5% | 10% |
| Contingency | 10% | 0% (moved to explicit line items) |
The difference isn’t budgeting more money overall. It’s budgeting the right things. When iteration is hidden inside “development,” clients see the development number and assume the 10% contingency is generous padding. When the iteration runs long, the project runs over.
Making iteration explicit does two things: it gives clients a realistic picture of where time goes, and it creates a specific milestone that can trigger a conversation if the iteration takes longer than expected.
We also stopped using contingency as a catch-all buffer. Instead, we scope the most predictable unknown (integration complexity) at a fixed percentage sized against the quality of the client’s existing documentation. A 15% integration buffer is more honest than a 10% miscellaneous contingency that nobody knows how to measure.
What to Do If You’re Already Over Budget
If you’re mid-project and the numbers don’t match the plan, start with a scope audit, not a budget conversation.
A scope audit takes about two hours: list everything in the active backlog, categorize it as (1) in original scope, (2) added during the project, or (3) technically required but not anticipated. Then look at how much remaining budget is allocated to each category.
Most over-budget projects I’ve reviewed aren’t over because engineering is slow. They’re over because categories 2 and 3 are larger than anyone realized. Once that’s visible, the client and team can make a real decision: cut category 2 additions, extend the budget, or ship category 1 and plan a phase two.
The harder conversation is when infrastructure surprises have consumed the contingency. In those cases, I’m direct: “We’re in a situation we didn’t anticipate. Here’s what it cost, here’s what it bought us, here’s what we need to finish. What do you want to do?” The PMI’s guide on project change management covers the formal approvals side if you need a documented chain. But ambiguity about where the money went is almost always worse than the direct conversation.
For ongoing projects, the best investment is a weekly budget check alongside the sprint review. We track three numbers every week: hours consumed versus plan, estimated hours to completion, and the delta between original scope and current scope. If the delta is growing faster than expected, we raise it before it becomes a problem, not after.
FAQ
How much contingency should I build into an AI project budget?
Rather than a generic contingency percentage, scope specific risk lines instead. Integration complexity: 15% minimum, higher if the client’s existing systems are poorly documented. Model iteration: 20% of development time, as its own milestone. Traditional contingency (“unknown unknowns”): avoid it. It gives false comfort and doesn’t help you diagnose what’s actually running over.
Why do AI projects go over budget more often than regular software?
Two reasons. First, AI components have probabilistic accuracy targets rather than binary pass/fail tests. Reaching a specific accuracy level requires iteration that’s hard to forecast. Second, AI products typically pull data from more systems than standard software, so integration complexity is higher. Both problems are solvable with the right estimation structure. They’re just different from what most software estimation frameworks assume.
What should I ask an AI development company before signing a contract?
Ask how they handle accuracy targets (are they defined before engineering starts?), how they account for integration with your existing systems, and what their process is for scope changes mid-project. A vendor who gives vague answers to those questions will likely have a vague process when things get complicated. Ask to see a Sprint 0 brief from a previous project. The quality of their planning documents tells you more than any pitch deck.
What’s a realistic timeline for AI development services?
Small projects (single-feature integrations, chatbots, proof-of-concept builds): 2-4 weeks. Medium projects (full AI features embedded in existing products, RAG systems with evaluation pipelines): 1-3 months. Large projects (end-to-end AI products with multiple models, production infrastructure, compliance requirements): 3-6 months. Any AI development agency quoting faster than this for complex work isn’t accounting for the iteration phase.
How do I know if a project is over budget from bad estimation or bad execution?
A scope audit reveals this. If the original scope is tracking on time and the budget problem is coming from additions not in the original plan, that’s estimation drift: you needed better scope locking at the start. If the original scope itself is running over, that’s execution drift, often from the integration and iteration factors above. The two problems have different solutions, so diagnosing which one you have before deciding how to respond saves time.
If you’re planning an AI build and want a frank read on whether your budget is realistic, book a 30-minute call. We’ll walk you through the scope questions that surface the expensive surprises before engineering starts.