
A payments company we worked with wasconfident in their compliance posture. They recorded 100% of calls, they hadpause-and-resume for card capture, they ran quarterly PCI assessments, andthey’d never had a finding. The head of compliance described the program to usas “buttoned up.”
We ran a single test against six monthsof their recordings. We searched for spoken card numbers in segments wherepause-and-resume should have suppressed them. We found card data in roughly0.4% of relevant calls — agents who’d forgotten to trigger the pause, customerswho’d read their number before the agent was ready, system failures where thepause didn’t engage. In a center handling 200,000 card-present calls a quarter,0.4% is 800 calls with exposed PAN data sitting in the recording archive.
The compliance program wasn’t buttonedup. It was buttoned up everywhere a human had thought to put a button. The 0.4%lived in the gap between the policy and the execution, which is exactly wherecompliance findings come from.
For most of the last decade, contact center compliance was adocumentation exercise. You had a policy. You trained agents on it. You sampledsome calls. You filed the assessment. The regulator generally accepted that areasonable program, reasonably executed, was sufficient.
That standard has shifted across several frameworks in the past24-36 months, and the common direction is the same: from “do you have aprogram” to “can you prove it worked on every interaction.”
PCI DSS 4.0.1,mandatory since March 2025, tightened the requirements around sensitiveauthentication data in recordings substantially. Pause-and-resume is no longerpresumptively sufficient in environments where card numbers are spoken. Everyadministrative action on recordings — playback, download, export — needs anaudit trail. The standard assumes you can demonstrate control across the fullarchive, not across a sample.
The EU AI Act’s high-risk provisions, phasing in through 2026,classify AI systems used for contact center performance monitoring andemployment decisions as high-risk, with logging, oversight, and transparencyobligations that apply to any organization whose AI touches EU residents.
In financial services, off-channel communications enforcement in theUS has produced billions in penalties since 2021, with regulators making clearthat inability to demonstrate monitoring equals liability.
The pattern is consistent. The regulator no longer accepts theprogram as evidence. They want the outcome, across every interaction.
When compliance runs on a 2-5% sample, three categories of exposureare structurally invisible.
The execution gap. Policies are correct; execution is imperfect. Thepause-and-resume that didn’t engage. The disclosure the agent skipped. Theconsent that wasn’t captured. These happen at low rates per call but highabsolute volumes, and a sample large enough to reliably catch a 0.4% failurerate would have to be far larger than any manual QA team can review.
The drift gap. Compliance behavior decays over time as agents getcomfortable, as edge cases accumulate, as new products launch without updatedscripts. Sampling catches drift only after it’s become common enough to show upin a small sample — by which point it’s been happening for months.
The systems gap. Some compliance failures aren’t about agents atall. A recording system that fails to engage on certain call types. A redactionprocess that misses certain number formats. These are invisible to call-levelQA because they’re infrastructure failures, and they’re often the most seriousbecause they’re systematic rather than occasional.
When speechanalytics runs compliance checks against 100% of interactions instead of asample, the compliance function changes character.
Detection moves from retrospective to near-real-time. A disclosurefailure is flagged within hours of the call, not weeks later in a QA cycle. Theremediation window shrinks from “next coaching session” to “same day.”
Evidence becomes comprehensive. When a regulator asks how youmonitor a given requirement, the answer shifts from “we sample 200 calls aquarter” to “we screen every interaction against this criterion and here’s theexception report.” The second answer is defensible in a way the first oneincreasingly isn’t.
Exposure becomes quantified. Instead of assuming the samplerepresents the whole, you know the actual failure rate across the full volume.The 0.4% becomes a number you can manage rather than a risk you can’t see.
1. Search your recording archivefor exposed sensitive data. Pick one regulated datatype — card numbers, in your environment. Search recent recordings forinstances that should have been suppressed. The hit rate will tell you yourreal execution gap.
2. Map every compliance requirement toits evidence. For each obligation that touchesconversations, answer: “If a regulator asked for proof of monitoring, whatwould we produce?” Any requirement where the answer is “a sample” is anexposure.
3. Audit your administrative accesslogs. PCI DSS 4.0.1 requires audit trails onrecording access. Check whether yours capture playback, download, and exportwith user IDs and timestamps. Most legacy systems don’t.
4. Test one systems-level control. Pick your pause-and-resume or redaction process. Verify it engagescorrectly across every call type, not just the common ones. Systems gaps hidein the uncommon paths.
5. Calculate your sample’s statisticalpower. For your most important compliancerequirement, work out the failure rate your current sample size could reliablydetect. If it’s higher than your acceptable risk threshold, your monitoring isstructurally inadequate regardless of how well it’s executed.
Compliance findings don’t come from thecalls you reviewed. They come from the calls you didn’t, in the gap between thepolicy you wrote and the execution you assumed. The 0.4% is always there. Theonly question is whether you find it before the regulator does.