Case Study

Sales Call Compliance AI: Every Call Reviewed, Automatically

An AI system that transcribes, diarizes, and scores every sales call against a compliance rubric. Replaced manual QA that covered 5% of calls with automated analysis of 100%.

Client Enterprise Tech Company

Industry Enterprise Sales / Compliance

Duration 2 weeks

Team 3 engineers + PM

40%

Compliance improvement

95%

QA cost reduction

100%

Call coverage (was 5%)

2 weeks

Kickoff to production

The Problem

The client's sales team handled hundreds of calls per week. Their compliance team manually reviewed about 5% of them. That 95% gap wasn't a staffing issue. It was math: a human reviewer takes 20-30 minutes to properly audit a 15-minute call against a compliance checklist. You can't hire enough reviewers to cover everything without the cost becoming absurd.

Compliance issues were getting missed. Calls that should have included specific disclosures, consent language, and product explanations often didn't. The team only found out during audits or, worse, customer complaints.

What We Built

An end-to-end pipeline that automatically reviews every sales call for compliance:

Audio ingestion via webhook from the client's call recording platform
Transcription using Deepgram Nova-2 (switched from Whisper after testing on noisy call audio)
Speaker diarization to isolate rep speech from customer speech
LLM compliance scoring against a structured rubric with per-section pass/fail and reasoning
Dashboard showing scores, trends, and specific failure explanations per call

Key insight: The hardest part wasn't the AI. It was getting the client to write down exactly what "compliant" means. That took two days of workshops before we wrote a single line of pipeline code. Without a precise rubric, no model can score compliance accurately.

Technical Decisions

Why Deepgram over Whisper: Whisper large-v3 hit ~4% word error rate on clean audio, but degraded to ~18% on noisy call recordings (speakerphone, mobile). Deepgram Nova-2 held at ~6% WER across quality levels and processed audio at 400ms per minute versus 3.2 seconds for Whisper. At 200+ calls per day, that gap compounds.

Why LLM scoring over keyword matching: Our first prototype used keyword detection. It scored 58% accuracy. Compliance isn't about exact phrases. A rep saying "just so you know, we record these calls" is compliant even though it doesn't match the script verbatim. LLM analysis against a structured rubric reached 94% agreement with human reviewers.

Scoring approach: GPT-4o evaluates each compliance section independently, returning a pass/fail plus a 2-3 sentence explanation. The per-section breakdown gives managers actionable coaching feedback, not just a single number.

Tech Stack

Python FastAPI Deepgram Nova-2 pyannote.audio GPT-4o Redis React PostgreSQL

Results

After six weeks in production:

40% improvement in overall sales compliance scores across the team
95% reduction in QA review cost (humans now only review flagged edge cases)
100% call coverage versus 5% before. Every call gets reviewed, not a sample
Same-day feedback instead of the 2-3 week delay of manual review cycles

The compliance team shifted from reviewing calls to reviewing the AI's edge-case flags and coaching reps based on the AI's per-section breakdowns. That's a better use of their time.

Read the Full Build Story

We wrote a detailed technical walkthrough of this project, including both wrong turns we took before landing on the final architecture: How We Built a Sales Call Compliance AI in 2 Weeks.

Technical Deep Dives

How we think about the problem, the tech trade-offs we made, and what we'd do differently. Written by the engineers who shipped it.

Technical·May 6, 2026

Sales Call Compliance AI: 5 Architecture Choices

The 5 architecture decisions that determine what your compliance AI costs and whether it holds up in production. Numbers from a build we shipped.

Read post

Case Studies·May 5, 2026

How We Built Our Call Compliance AI: 5 Decisions

5 decisions that shaped our sales call compliance AI, from rubric design to dashboard UX. What the client said in our retrospective.

Read post

Technical·Apr 14, 2026

Agentic AI in Production: Tool-Calling, Planning, Recovery

Tool schemas, planning loops, and error recovery for production AI agents. Six deployed systems, real failure data, and the patterns that actually hold.

Read post

Want something like this built?

Tell us the problem. We'll tell you what 72 hours can produce.

Chat on WhatsApp View More Case Studies

Usually reply within hours, max 12. | Prefer a scheduled call? Book 30 min →

Tuesday Build Notes · 3-min read

One engineering tradeoff, every Tuesday.

From the engineers actually shipping. What we tried, what broke, what we'd do differently. Zero "5 AI trends to watch." Unsubscribe in one click.

Issue #1 lands the moment you subscribe: how we cut a client's LLM bill 60% without losing quality. The 3 model-routing rules we now use on every project.