Case Study

Sales Call Compliance AI: Every Call Reviewed, Automatically

An AI system that transcribes, diarizes, and scores every sales call against a compliance rubric. Replaced manual QA that covered 5% of calls with automated analysis of 100%.

Client Enterprise Tech Company

Industry Enterprise Sales / Compliance

Duration 2 weeks

Team 3 engineers + PM

40%

Compliance improvement

95%

QA cost reduction

100%

Call coverage (was 5%)

2 weeks

Kickoff to production

The Problem

The client's sales team handled hundreds of calls per week. Their compliance team manually reviewed about 5% of them. That 95% gap wasn't a staffing issue. It was math: a human reviewer takes 20-30 minutes to properly audit a 15-minute call against a compliance checklist. You can't hire enough reviewers to cover everything without the cost becoming absurd.

Compliance issues were getting missed. Calls that should have included specific disclosures, consent language, and product explanations often didn't. The team only found out during audits or, worse, customer complaints.

What We Built

An end-to-end pipeline that automatically reviews every sales call for compliance:

Audio ingestion via webhook from the client's call recording platform
Transcription using Deepgram Nova-2 (switched from Whisper after testing on noisy call audio)
Speaker diarization to isolate rep speech from customer speech
LLM compliance scoring against a structured rubric with per-section pass/fail and reasoning
Dashboard showing scores, trends, and specific failure explanations per call

Key insight: The hardest part wasn't the AI. It was getting the client to write down exactly what "compliant" means. That took two days of workshops before we wrote a single line of pipeline code. Without a precise rubric, no model can score compliance accurately.

Technical Decisions

Why Deepgram over Whisper: Whisper large-v3 hit ~4% word error rate on clean audio, but degraded to ~18% on noisy call recordings (speakerphone, mobile). Deepgram Nova-2 held at ~6% WER across quality levels and processed audio at 400ms per minute versus 3.2 seconds for Whisper. At 200+ calls per day, that gap compounds.

Why LLM scoring over keyword matching: Our first prototype used keyword detection. It scored 58% accuracy. Compliance isn't about exact phrases. A rep saying "just so you know, we record these calls" is compliant even though it doesn't match the script verbatim. LLM analysis against a structured rubric reached 94% agreement with human reviewers.

Scoring approach: GPT-4o evaluates each compliance section independently, returning a pass/fail plus a 2-3 sentence explanation. The per-section breakdown gives managers actionable coaching feedback, not just a single number.

Tech Stack

Python FastAPI Deepgram Nova-2 pyannote.audio GPT-4o Redis React PostgreSQL

Results

After six weeks in production:

40% improvement in overall sales compliance scores across the team
95% reduction in QA review cost (humans now only review flagged edge cases)
100% call coverage versus 5% before. Every call gets reviewed, not a sample
Same-day feedback instead of the 2-3 week delay of manual review cycles

The compliance team shifted from reviewing calls to reviewing the AI's edge-case flags and coaching reps based on the AI's per-section breakdowns. That's a better use of their time.

Read the Full Build Story

We wrote a detailed technical walkthrough of this project, including both wrong turns we took before landing on the final architecture: How We Built a Sales Call Compliance AI in 2 Weeks.

Want something like this built?

Tell us the problem. We'll tell you what 72 hours can produce.

Book a Call → View More Case Studies