Human Oversight Can No Longer Protect Customers From AI Hallucinations

Continuous AI testing is emerging as a critical safeguard against customer-facing errors and brand damage

AI & Automation in CX Feature

Published: June 15, 2026

Francesca Roche

Today, the growing use of AI in CX has created a risk of hallucination, delivering confident but inaccurate information to customers at scale.

The issue gained renewed attention after law firm Sullivan & Cromwell acknowledged AI-generated errors in a federal court filing, demonstrating that even highly reviewed environments are vulnerable to convincing AI mistakes.

For CX leaders, customer-facing AI now requires further attention, as a single error can quickly become a widespread trust, compliance, and reputational problem.

Jackie Swanson, Managing Partner at Gartner Consulting, told CX Today that CX AI hallucinations go largely unreviewed, enabling AI systems to confidently deliver incorrect information that looks indistinguishable from correct answers.

“Court filings get scrutinized. CX interactions do not. The same model that confidently invented a case citation in front of a federal judge is confidently inventing return policies, warranty terms, and product specifications inside enterprise chatbots every day,” she explained.

“The hardest part about CX hallucinations is that they are invisible by design. A model gives a wrong answer with the same tone and the same fluency as a right one.”

In April, the Wall Street law firm Sullivan & Cromwell admitted to submitting a bankruptcy court filing containing AI-generated errors.

This included numerous inaccuracies such as fabricated citations, incorrect quotations, and references to legal authorities that either did not exist or had been misrepresented.

Furthermore, reports indicate that these errors spanned dozens of citations and included passages of other real cases, illustrating how confidently presented AI outputs can appear credible even when they are incorrect.

And whilst the incident was caught early, these highlight why AI hallucinations can be particularly dangerous in CX environments, where many AI-generated responses reach customers immediately, often without any human review.

Speaking with CX Today, Sushil Kumar, CEO at Cyara, emphasized how the court case demonstrated the limits of human oversight, showing that even highly skilled manual review cannot effectively scale to monitor the volume of outputs produced by modern AI systems.

“Even at a highly respected law firm, with intelligent lawyers and human review in place, AI-generated errors still slipped through,” he noted.

“Humans aren’t built to audit machines at scale. A single agent trying to review a million conversations a day simply isn’t possible. And when AI makes a mistake, it doesn’t announce itself.”

Unlike a legal filing, CX systems can distribute inaccurate information simultaneously across channels, with customers likely acting on those responses and creating highly reputational consequences for the organization.

Unfortunately, plausibility is often what makes hallucinations so difficult to detect, and in CX, the same dynamic as in a courtroom can allow errors to spread even faster.

“There’s no flashing red light, no obvious signal that something is wrong,” he pointed out.

“The response sounds fluent, polished, and confident, even when it’s inaccurate, and that’s the trap CX leaders are falling into.”

As a result, installing robust monitoring systems, escalation workflows, and human oversight therefore becomes essential safeguards against AI-generated misinformation reaching customers at scale.

Why AI Hallucinations Are Potentially More Dangerous in CX

In CX, AI hallucinations present a unique risk because customer-facing systems operate at an enormous scale and often without human review.

In contrast to a courtroom, an AI assistant with inaccurate information will spread this across thousands or even millions of interactions before the issue is identified and corrected.

In fact, this challenge is rooted in the reality of customer behavior, with their frequent unpredictable nature during interactions often falling outside of clean, predictable paths.

“No customer behaves like a test script. Real customers interrupt, change their minds, get emotional, and call customer service lines at 11pm angry about a bill,” Kumar highlights.

“Even if a bot handles 95% of interactions well, the remaining 5% can become the moments that matter most.”

This seemingly minor issue could represent thousands or millions of customers receiving incorrect information, failing to resolve their issues, or leaving interactions more frustrated.

Furthermore, customers often judge AI mistakes more harshly than human ones, with Cyara’s research revealing 61% of customers reporting that bot failures are more frustrating than human errors.

As a result, even relatively infrequent hallucinations can have outsized effects on customer trust and satisfaction.

Is AI Quietly Rewriting the Customer Relationship?

Furthermore, the business risks posed by AI hallucinations can also alter customer expectations and perceptions of a brand.

This trust dynamic between brand and customer makes AI-generated misinformation particularly dangerous, as customers are not able to accurately distinguish between human-provided and AI-provided information.

Customers don’t have a relationship with the model provider, they have a relationship with the brand,” Kumar stated.

“If an AI assistant gives the wrong answer, the customer won’t be angry at the model, they’ll be angry with the company. The enterprise chooses the model, the deployment, and the testing, so it is ultimately responsible for the outcomes.”

When responsibility rests with the enterprise deploying the technology, organizations cannot deflect problem ownership when AI-generated errors affect customers.

As a result, organizations placing AI in front of customers must be prepared to stand behind the answers it provides.

Here, robust CX assurance becomes critical, as effective testing and monitoring can help organizations identify and resolve issues before they reach customers, reducing the likelihood of preventable failures.

By continuously evaluating AI interactions and validating customer-facing responses, enterprises can protect both customer trust and brand reputation while maintaining experience accountability.

The Different Types of Hallucinations Enterprises Need to Watch For

AI hallucinations are often discussed as a single problem, but in practice, they emerge in several different forms, each carrying distinct risks for CX teams and can appear both straightforward and subtle.

For example, contextual drift can occur during longer conversations when a model moves away from accurate information, building on earlier mistakes and creating responses that sound plausible but are increasingly disconnected from reality.

“AI systems require a different model because the risk does not end at launch,” acknowledged Kumar.

“Once the system is exposed to real customers, new behaviors and failure modes start to appear.”

AI systems cannot be approached like traditional software; they will likely encounter unexpected conversational paths and customer behaviors that can reveal entirely new failure modes.

As a result, enterprises need to view hallucination management as an ongoing operational discipline where testing must evolve from a release-stage activity into an always-on capability.

Why Human Oversight Alone Is No Longer Enough

As a result, human oversight remains more important than ever, however it is no longer sufficient as the primary safeguard against risk.

Traditional governance models often assume that trained employees can review outputs, but this approach is becoming increasingly impractical when AI systems operate continuously across millions of interactions and evolve through frequent updates.

Instead, modern AI systems have outgrown what manual oversight alone can reasonably manage.

“The lesson isn’t that humans should be removed from the process. Human oversight still matters, but it should function as a checkpoint rather than the entire CX assurance system,” Kumar said.

“Effective AI governance has to move at machine-speed, with automated validation, guardrails, and real-time testing.”

The nature of AI model updates has contributed to this shift, as businesses often receive no clear warning about how those changes could affect customer interactions.

This creates a significant governance challenge for systems that were thoroughly tested and approved, which may behave differently weeks or months later.

“Even a small model shift can break a customer flow, alter tone, or create new failure points in an experience that was validated just a month earlier, and manual regression testing simply cannot keep up with that pace of change,” he continued.

As AI adoption accelerates, organizations need governance frameworks that combine human judgment with continuous automated assurance.

Whilst human reviewers remain essential for accountability and decision-making, scalable risk management now depends on systems that can monitor and validate AI behavior in real time.

How Enterprises Can Prevent Hallucinations From Becoming Business Risks

Preventing AI hallucinations from becoming business risks requires enterprises to shift from reactive problem-solving to continuous assurance, no longer able to just rely on periodic testing or manual reviews.

This means using systems capable of evaluating AI performance continuously, introducing synthetic customers that allow organizations to simulate a wide range of real-world interactions before they affect actual users.

This approach helps uncover weaknesses and provides ongoing visibility into how AI performance evolves over time.

“CX teams need to treat AI like production infrastructure, not just another digital feature,” Kumar emphasized.

“Drift detection, automated test coverage, and ongoing quality monitoring are table stakes for maintaining trust, consistency, and performance in the customer experience.”

These tools help organizations identify when AI behavior changes, assess the impact of model updates, and ensure that customer-facing systems remain aligned with company policies and service standards.

In fact, Kumar believes AI assurance is following a path similar to cybersecurity, with testing likely shifting from a technical consideration to a board-level risk management priority.

“Cybersecurity took 20 years to become a mandatory business function, but AI testing will only take three to five years for enterprises to take just as seriously,” he argues.

As a result, organizational success will belong to those best at operationalizing it, where continuous testing will become an essential component of enterprise risk management as AI evolves.

Agentic AI Agentic AI in Customer ServiceAI Agents AI Governance Tools Automation Autonomous Agents Risk Management Security and Compliance Trust & Safety

Cyara Sushil Kumar