A contact center we worked with ran arigorous quarterly performance review process. Each agent got a formalevaluation built on QA scores, attendance, adherence, and a manager narrative.The process was documented, fair on paper, and universally disliked. When weasked agents what they thought of it, the most common response was a version of“it doesn’t describe me.”

We looked at what the reviews wereactually built on. The QA component drew on five scored calls per month —fifteen calls per quarter, out of roughly 1,500 the agent handled. The managernarrative drew heavily on recent events, because that’s what managers remember.The review that was supposed to assess a quarter of work was actually assessingfifteen calls and the last three weeks. The agents were right. It didn’tdescribe them, because it was never built from enough data to describe anyone.

This is the structural flaw in nearlyevery contact center performance review. The format implies comprehensiveassessment. The inputs are a tiny, non-random sample plus recency bias. The gapbetween what the review claims to measure and what it actually measures isenormous, and everyone involved can feel it even when they can’t articulate it.

Why the Format Fails

The quarterly review has three input problems that no amount ofprocess rigor can fix.

The sample is too small. Fifteen scored calls cannot representfifteen hundred. Whatever pattern the review identifies might be real or mightbe sampling noise, and there’s no way to tell which. An agent who had two badcalls among their fifteen reviewed looks worse than an agent whose two badcalls happened to fall outside the sample, even if their true performance isidentical.

The sample is biased. Calls selected for QA aren’t random — they’reoften the ones agents knew were being monitored, or the ones a manager flagged,or the ones that fit the QA schedule. Each selection method introduces biasthat the review then treats as representative.

The narrative is recency-weighted. Human memory privileges recentand emotional events. A manager writing a quarterly narrative remembers lastweek’s escalation and last month’s win far more vividly than the steadycompetent work of weeks five through eight. The narrative describes thememorable quarter, not the actual one.

WhatContinuous Measurement Changes

When agentmanagement runs on 100% of interactions instead of a quarterly sample, thereview changes from an event to a reflection of an ongoing record.

The assessment becomes comprehensive. Every call contributes to thepicture. The agent’s actual patterns — across all their work, not fifteen calls— are what the review reflects. Sampling noise disappears because there’s nosample.

The feedback becomes continuous. Instead of accumulatingobservations for a quarterly download, coaching happens in near-real-time aspatterns emerge. The quarterly review stops being the moment the agent learnshow they’re doing and becomes a summary of conversations that have beenhappening all along.

The bias becomes correctable. Recency bias is a human cognitivelimitation. A system that weighs all of an agent’s work equally doesn’t haveit. Managers can still add human judgment and context, but the foundation iscomprehensive rather than memory-dependent.

The Trust Dimension

The most underrated cost of bad performance reviews is what they doto the relationship between agents and the measurement system.

When agents experience reviews as inaccurate — built on too fewcalls, distorted by recency, disconnected from their actual work — they stoptrusting the entire performance apparatus. The QA scores lose credibility. Thecoaching loses authority. The whole system gets treated as a bureaucraticritual to be survived rather than a genuine signal to act on.

Comprehensive measurement rebuilds this trust, but only if it’stransparent. Agents who can see their own performance data continuously,understand how it’s calculated, and verify it against their own experience oftheir work, engage with it very differently than agents who receive a quarterlyverdict from a black box. The shift from sampling to full coverage matters mostwhen the agent can see the same data the manager sees.

Five Things You Can Do This Week

1. Count the calls behind your lastreview cycle. How many calls per agent actuallyinformed the QA component? Divide by total calls handled. The coverage will belower than the review’s authority implies.

2. Test for recency bias in managernarratives. Read three recent reviews. Note howmany specific examples come from the final month versus the first two. The skewwill be visible.

3. Ask agents whether their reviewdescribes them. The answer tells you whether yourprocess has credibility. Disagreement is a signal the inputs are inadequate.

4. Separate signal from samplingnoise. For one agent, compare their QA score on thereviewed sample against their score across a much larger set of calls. If thenumbers differ materially, your sample isn’t representative.

5. Move toward continuous feedback. Even a monthly version of the review, built on more calls, beats aquarterly verdict built on fifteen. Frequency and coverage both improveaccuracy.

A performance review built on fifteencalls and three weeks of memory isn’t a measurement. It’s an impression with aform attached. The agents who say it doesn’t describe them are giving youaccurate feedback about your process, and the fix isn’t a better form — it’senough data to actually describe the work.

‍

Burnice Ondricka

The AI terminology chaos is real. Your "divide and conquer" framework is the clarity we needed.

Heanri Dokanai

Finally, a clear way to cut through the AI hype. It's not about the name, but the problem it solves.

Agent Performance Reviews: The Quarterly Ritual That Measures the Wrong Quarter

Why the Format Fails

WhatContinuous Measurement Changes

The Trust Dimension

Five Things You Can Do This Week

Our latest blogs