AI Agents in Customer Service: Helping Versus Pretending to Help

A consumer brand we worked withlaunched what their vendor described as an “AI customer service agent” in late2025. The marketing positioned it as autonomous problem-solving. The demoshowed it handling returns end-to-end. The pricing reflected an enterprisecommitment. Six months later, the implementation team came to us frustrated.The “agent” was effectively an elaborate decision tree wrapped inconversational language. It handled simple cases well, escalated nearlyeverything complex, and the resolution work the marketing had promised wasstill being done by humans.

We looked at the deployment, then atseveral other “AI agent” deployments across our client base. The pattern wasconsistent. The category “AI agent” had become marketing shorthand applied todramatically different capabilities — from sophisticated rules engines withchat interfaces all the way to genuinely autonomous systems with tool-use andreasoning. The customer-facing labels were nearly identical. The underlyingcapabilities were not.

This matters because the operationalimplications of deploying a real agentic system are completely different fromdeploying a sophisticated chatbot. Conflating the two — which most vendorpitches do — leads to procurement decisions, implementation plans, andgovernance frameworks that don’t match what’s actually being deployed.

What Distinguishesa Real Agent

The technical literature on agentic AI converges on a fewcharacteristics that separate genuine agents from advanced rule-based systems.

Tool use. Real agents can take actionsin external systems — query databases, update records, trigger workflows — notjust generate text responses. A system that can only talk isn’t an agent in themeaningful sense.

Planning across multiple steps. An agentcan decompose a complex task into a sequence of sub-tasks, execute them, andadapt the plan based on intermediate results. A system that handles one turn ata time, without state across turns, isn’t agentic.

Decision-making under uncertainty.Agents can reason about confidence in their own actions, decide when to act andwhen to defer, and choose between multiple possible approaches. A system thatalways takes the highest-confidence next action isn’t reasoning, it’sexecuting.

Goal persistence. Agents pursue anobjective across multiple interactions, maintaining state and adaptingstrategy. A system that resets after each turn isn’t pursuing anything.

By these criteria, most “AI agents” being marketed to contactcenters are not, in fact, agents. They’re sophisticated conversationalinterfaces over rules engines or retrieval systems. This isn’t a criticism —sophisticated chatbots have real value. It’s a clarification that affects howthey should be procured, deployed, and governed.

Why theDistinction Matters Operationally

Genuine agentic systems and sophisticated chatbots behavedifferently in production, and the operational handling needs to match.

Failure modes differ. A rules-basedsystem fails when it encounters a case it wasn’t designed for — usuallypredictably, often gracefully. An agentic system can fail in less predictableways because it’s making decisions across longer chains. The QA and monitoringrequirements are different.

Auditability differs. A rules-basedsystem has traceable decision paths. An agentic system’s behavior depends onmodel state, tool-call sequences, and context that may not be fully logged bydefault. Regulatory environments care about this distinction.

Update cycles differ. A rules-basedsystem gets updated by changing rules. An agentic system gets updated byretraining or by prompt engineering, with corresponding effects on behaviorthat can be subtle and hard to test comprehensively.

Cost structures differ. Rules-basedsystems have predictable per-interaction cost. Agentic systems have variablecost depending on reasoning depth and tool calls, which complicates capacityplanning.

Procurement decisions made without these distinctions clear in mindfrequently produce mismatched governance — agentic systems treated like rulesengines (insufficient oversight) or rules engines treated like agents(over-engineered governance).

What“Real” Agentic AI Currently Does Well in Contact Centers

Where genuine agentic systems are starting to deliver value, the usecases tend to share characteristics.

Multi-step transactional work. Accountchanges, returns processing, scheduling — tasks that require multiple systeminteractions and some judgment about which path to take. Agentic systems cancompress these into single customer interactions where rules-based systemswould require multiple escalations.

Investigation and resolution. Looking upinformation across multiple systems to diagnose an issue, then taking theresolution action. Rules-based systems can do simple versions of this; agenticsystems handle the variations that rules engines need an explicit path for.

Personalized engagement. Tailoringapproach to the customer’s history, current state, and apparent preference.Agentic systems can make this judgment in real time; rules engines needpre-built rules for each variation.

Handoff with full context. When theagent can’t or shouldn’t complete a task, transferring to a human withsubstantive context preserved. This is one of the highest-value applicationsand one of the worst-implemented across current deployments.

What MostCurrent Deployments Are Actually Getting

The marketing of “AI agents” has outrun the deploymentssignificantly. Most contact center “AI agent” deployments in production todayare doing one of these things well, not the full agentic capability.

Many are sophisticated chatbots with conversational polish — useful,but conceptually closer to evolved IVR than to autonomous agents.

Some are retrieval-augmented systems that surface relevantinformation and let humans act on it — valuable, but not autonomousproblem-solving.

A smaller number are doing genuine multi-step transactional work innarrow domains — the actual frontier of useful agentic deployment.

Knowing which one you have determines what you can realisticallyexpect, what governance you need, and what claims you can defensibly make aboutthe deployment.

Five Things You Can Do This Week

1. Test your current “AI agent”against the four criteria. Tool use, multi-stepplanning, uncertainty reasoning, goal persistence. The result will clarify whatyou actually have.

2. Audit what your deployment actuallyresolves vs. escalates. True resolution rate,not chatbot containment. The number will inform what you should expectoperationally.

3. Map your governance to what youactually have. If your “agent” is a sophisticatedrules engine, your governance can be lighter. If it’s genuinely agentic, yourgovernance needs to be heavier.

4. Listen to 20 customer-facinginteractions. Are they conversations or are theydecision-tree navigation in friendly language? The customer can usually tell.

5. Define what “real success” lookslike for your deployment. Not vendor metrics —operational outcomes. The clearer this is, the easier the next 12 monthsbecome.

The “AI agent” category currently spanscapabilities that differ by orders of magnitude. The marketing flattens this.Procurement decisions made on flattened marketing produce deployments thatdon’t match what was promised, governance that doesn’t match what was deployed,and customer experiences that don’t match what was sold. The first step ingetting value from agentic AI is being precise about what you actually have —which is harder than it sounds when the vendor pitches don’t differentiate.

Client
Burnice Ondricka

The AI terminology chaos is real. Your "divide and conquer" framework is the clarity we needed.

IconIconIcon
Client
Heanri Dokanai

Finally, a clear way to cut through the AI hype. It's not about the name, but the problem it solves.

IconIconIcon
Arrow
Previous
Next
Arrow