Insights
· 11 min read

Why AI Projects Run Over Budget: Lessons From 20+ Builds

Five real reasons AI development projects run over budget, from 20+ builds. Budget allocation template and the patterns we stopped ignoring.

Dharini S
Dharini S
People and process before product — turning founder visions into shipped tech
Share
Why AI Projects Run Over Budget: Lessons From 20+ Builds
TL;DR
  • Integration with existing systems takes 3-5x longer than teams estimate. This single factor accounts for roughly 40% of overruns we've seen.
  • Accuracy iteration is unbounded by default. Without a threshold defined before engineering starts, model tuning becomes an open-ended expense.
  • AI demos accelerate client ambition. Clients see what's possible in week two and want more. Without a written scope lock, that ambition hits the budget.
  • Most AI project budgets allocate 80% to development and nothing to iteration. We now split: 55% dev, 20% iteration, 15% integration buffer, 10% docs.
  • Prototype-first scoping catches the expensive surprises before engineering begins, not after.

The founder called on a Thursday evening. We’d shipped a working product in six weeks, which he was genuinely happy with. The budget was a different conversation.

“I thought we had a fixed price,” he said.

We did. For the scope we’d agreed on in week one. The scope in week six was not the same scope.

I’ve had that call more times than I’d like. After 20+ AI builds in the last two years, I’ve stopped being surprised by budget overruns. What surprises me now is when they don’t happen. The conditions that cause them are consistent, predictable, and almost always traceable back to the same five problems.

The Integration Tax Nobody Budgets For

Every AI product we build needs to connect to something the client already has: a CRM, a call recording platform, a data warehouse, an existing web app with a twelve-year-old API.

That integration work almost never takes the time we estimate at the start. It takes 3-5x longer. Not because engineers are slow. Because existing systems have quirks that only appear when you’re actually connecting to them.

One project: we budgeted four days to integrate with a client’s telephony provider. The provider’s documentation said they had webhook support. They did. But the webhook payload format had changed in a 2022 update, and the documentation still described the 2019 format. Finding that discrepancy cost two developers three days before we could write a single line of AI code.

Another project: a client’s internal data warehouse didn’t support date-range filtering via API. Every integration call we needed during testing required a human on the client’s side to run a manual CSV export first. We hadn’t priced in that dependency. It added a week.

What we do differently now: integration gets its own line item in every estimate, with a minimum buffer of 15% of the total scope. If I can’t get clear answers about existing systems in the discovery call, the buffer goes higher, not lower.

The questions I ask now that I didn’t used to: What does the API documentation look like? When was it last updated? Who owns the systems we’re connecting to?

Accuracy Iteration Is Unbounded Without a Target

When a founder asks “how long will it take to build a model that can do X?”, there’s a truthful answer and a useful answer. The truthful answer: it depends on the accuracy you need and the quality of your training data. The useful answer requires knowing both before the project starts.

Most AI projects don’t know both at the start.

Here’s what typically happens. A team builds a classifier or a RAG retriever. It works at 72% accuracy in the first iteration. The client wanted 90%. Another round: 81%. Then 86%. Then 88%. Then three rounds trying to close the last two points, because marginal returns shrink as you approach the target.

That curve from 72% to 90% might take two weeks. Or six. Without a defined accuracy target agreed on before engineering starts, there’s no milestone that lets you call the model “done.” The iteration phase becomes an open-ended expense.

What we do differently: accuracy thresholds go into the Sprint 0 document. Not “we’ll aim for high accuracy.” Specific numbers: “the compliance checker must agree with the human reviewer at least 88% of the time on a 200-sample test set.” That test set is defined and locked. If the model hits 88%, the milestone is met. If we’re at 84% after three rounds, we have a real conversation: invest more to close the gap, or ship at 84%?

That conversation is uncomfortable in Sprint 0. It’s much worse in week seven, when the original estimate is exhausted.

What Demos Do to Budgets

AI demos are unusually powerful compared to traditional software demos. When a client sees a working model classify their actual sales calls with 89% accuracy, they don’t think “great, that’s what we agreed on.” They think: “if it can do that, can it also…”

I wrote about the mechanics of this in Scope Creep in AI Projects: How I Manage It. The short version for budget purposes: every demo is a scope risk if there’s no written scope lock.

The pattern is consistent. A client starts with one feature. Sprint two demo works well. Sprint three works better. By sprint four, the backlog has three new ideas that weren’t in the original scope, and the client expects them in the original budget.

This isn’t malicious. Clients respond to what they’re seeing in real time, and assume the original estimate had headroom for their enthusiasm. It almost never does.

What the written scope lock does specifically: it makes the original agreement concrete. When a client proposes something new, I can say “that’s not in scope for this project, but let me scope it for the next sprint.” The document makes the boundary visible without making it personal. You can read how we build that document during the discovery call process here.

Infrastructure and Compliance Surprises

Building an AI product in a clean development environment is very different from building one that has to meet real-world constraints: HIPAA, GDPR, SOC 2, a corporate security review, or infrastructure requirements from a larger partner the client forgot to mention.

We ran a project for an enterprise client where, four weeks in, they mentioned almost in passing that all data processing had to happen within their existing AWS VPC. We’d been building against a managed vector database service. Migrating to a self-hosted solution running inside their VPC added two weeks we hadn’t budgeted.

Another project: a healthcare client needed patient data to stay within India’s geographic boundaries. The third-party transcription API we’d selected didn’t have data residency options. Finding an alternative with comparable accuracy and compliant hosting took a week we hadn’t priced.

Neither client was being difficult. They just didn’t know, at the start, that these constraints were relevant. And we hadn’t asked specifically enough.

What we ask in every discovery call now: Are there data residency requirements? What are the information security policies we need to work within? Is there an IT security review process for new vendors? Are there existing infrastructure commitments that affect where we deploy?

These aren’t AI-specific questions. They’re standard software delivery questions. But AI projects add surface area because they touch data more extensively and bring new third-party services into the stack.

The Estimation Approach We Had to Rebuild

Early in our work, we estimated AI projects the same way we’d estimate standard software: based on features and complexity, with a margin for testing and QA. That worked fine for the software we’d built before. It doesn’t work for AI.

Here’s the PM version of why: the “done” test for an AI feature is different from any feature test I’d seen before. You can’t declare a model finished the way you declare a form submission finished. It’s finished when it’s accurate enough, and “accurate enough” requires rounds of evaluation that nobody has a line item for. That gap between “technically working” and “production-ready” is where most of the budget goes.

What our estimates look like now (Atlassian’s agile estimation guide covers the mechanics well for traditional software; the AI-specific additions are the iteration and integration rows):

ComponentOld allocationNew allocation
Development80%55%
AI model iteration0% (buried in dev)20%
Integration buffer5%15%
Documentation + handoff5%10%
Contingency10%0% (moved to explicit line items)

The difference isn’t budgeting more money overall. It’s budgeting the right things. When iteration is hidden inside “development,” clients see the development number and assume the 10% contingency is generous padding. When the iteration runs long, the project runs over.

Making iteration explicit does two things: it gives clients a realistic picture of where time goes, and it creates a specific milestone that can trigger a conversation if the iteration takes longer than expected.

We also stopped using contingency as a catch-all buffer. Instead, we scope the most predictable unknown (integration complexity) at a fixed percentage sized against the quality of the client’s existing documentation. A 15% integration buffer is more honest than a 10% miscellaneous contingency that nobody knows how to measure.

What to Do If You’re Already Over Budget

If you’re mid-project and the numbers don’t match the plan, start with a scope audit, not a budget conversation.

A scope audit takes about two hours: list everything in the active backlog, categorize it as (1) in original scope, (2) added during the project, or (3) technically required but not anticipated. Then look at how much remaining budget is allocated to each category.

Most over-budget projects I’ve reviewed aren’t over because engineering is slow. They’re over because categories 2 and 3 are larger than anyone realized. Once that’s visible, the client and team can make a real decision: cut category 2 additions, extend the budget, or ship category 1 and plan a phase two.

The harder conversation is when infrastructure surprises have consumed the contingency. In those cases, I’m direct: “We’re in a situation we didn’t anticipate. Here’s what it cost, here’s what it bought us, here’s what we need to finish. What do you want to do?” The PMI’s guide on project change management covers the formal approvals side if you need a documented chain. But ambiguity about where the money went is almost always worse than the direct conversation.

For ongoing projects, the best investment is a weekly budget check alongside the sprint review. We track three numbers every week: hours consumed versus plan, estimated hours to completion, and the delta between original scope and current scope. If the delta is growing faster than expected, we raise it before it becomes a problem, not after.

FAQ

How much contingency should I build into an AI project budget?

Rather than a generic contingency percentage, scope specific risk lines instead. Integration complexity: 15% minimum, higher if the client’s existing systems are poorly documented. Model iteration: 20% of development time, as its own milestone. Traditional contingency (“unknown unknowns”): avoid it. It gives false comfort and doesn’t help you diagnose what’s actually running over.

Why do AI projects go over budget more often than regular software?

Two reasons. First, AI components have probabilistic accuracy targets rather than binary pass/fail tests. Reaching a specific accuracy level requires iteration that’s hard to forecast. Second, AI products typically pull data from more systems than standard software, so integration complexity is higher. Both problems are solvable with the right estimation structure. They’re just different from what most software estimation frameworks assume.

What should I ask an AI development company before signing a contract?

Ask how they handle accuracy targets (are they defined before engineering starts?), how they account for integration with your existing systems, and what their process is for scope changes mid-project. A vendor who gives vague answers to those questions will likely have a vague process when things get complicated. Ask to see a Sprint 0 brief from a previous project. The quality of their planning documents tells you more than any pitch deck.

What’s a realistic timeline for AI development services?

Small projects (single-feature integrations, chatbots, proof-of-concept builds): 2-4 weeks. Medium projects (full AI features embedded in existing products, RAG systems with evaluation pipelines): 1-3 months. Large projects (end-to-end AI products with multiple models, production infrastructure, compliance requirements): 3-6 months. Any AI development agency quoting faster than this for complex work isn’t accounting for the iteration phase.

How do I know if a project is over budget from bad estimation or bad execution?

A scope audit reveals this. If the original scope is tracking on time and the budget problem is coming from additions not in the original plan, that’s estimation drift: you needed better scope locking at the start. If the original scope itself is running over, that’s execution drift, often from the integration and iteration factors above. The two problems have different solutions, so diagnosing which one you have before deciding how to respond saves time.


If you’re planning an AI build and want a frank read on whether your budget is realistic, book a 30-minute call. We’ll walk you through the scope questions that surface the expensive surprises before engineering starts.

#ai development services#ai development company#project management#ai project budget#delivery process#ai builds
Share

Tuesday Build Notes · 3-min read

One engineering tradeoff, every Tuesday.

From the engineers actually shipping. What we tried, what broke, what we'd do differently. Zero "5 AI trends to watch." Unsubscribe in one click.

Issue #1 lands the moment you subscribe: how we cut a client's LLM bill 60% without losing quality. The 3 model-routing rules we now use on every project.

Dharini S

Written by

Dharini S

People and process before product — turning founder visions into shipped tech

Dharini sits between the founder's vision and the engineering team, making sure things move in the right direction — whether that's a full-stack product, an LLM integration, or an agent-based solution. Her background in instructional design and program management means she thinks about people first — how they process information, where they get stuck, what they actually need — before jumping to solutions.

You read the whole thing. That means you're serious about building with AI. Most people skim. You didn't. Let's talk about what you're building.

KL

Kalvium Labs

AI products for startups

You've read the thinking.
The only thing left is a conversation.

Tell us your idea. We tell you honestly: can we prototype it in 72 hours, what would it cost, and is it worth building at all. No pitch. No deck.

Chat on WhatsApp

Usually reply within hours, max 12.

Prefer a scheduled call? Book 30 min →

Not ready to message? Describe your idea and get a free product spec first →

What happens on the call:

1

You describe your AI product idea

5 min: vision, users, constraints

2

We ask the hard questions

10 min: what happens when the AI gets it wrong

3

We sketch a 72-hour prototype

10 min: architecture, scope, stack, cost

4

You decide if it's worth pursuing

If AI isn't the answer, we'll say so.

Chat with us