The $735 Problem: Why Enterprise AI Governance is Set Up to Fail

Research from Gartner, TELUS, and Sinch explored the impact of AI governance on the CX and customer service space

6
Enterprise AI governance risk illustration 2026
AI & Automation in CXSecurity, Privacy & ComplianceFeature

Published: May 29, 2026

Rhys Fisher

Three separate research releases landed this week, each examining AI governance from a different angle.

Put them side by side and a consistent pattern emerges that should give most CX leaders pause.

Worldwide AI spending is projected to reach $2.52 trillion in 2026. The amount allocated to AI trust, risk, and security management is approximately $3.43 billion.

That works out to roughly one dollar in security for every $735 spent on capability, according to TELUS Digital’s GenAI Safety Model Benchmark, built on more than 620,000 adversarial tests across 34 AI models from 10 global providers.

Bret Kinsella, General Manager and Senior Vice President of Fuel iX™ at TELUS Digital, frames the consequences of that imbalance plainly:

“The real risk isn’t that AI models have vulnerabilities. It’s that most organizations have no way of knowing which vulnerabilities apply to them.”

That’s the thread running through all three studies. Organizations are deploying AI agents at scale and at speed, and the governance infrastructure behind those deployments is not keeping pace with the risk they’re accumulating.

A Problem that Has Already Arrived

The TELUS Digital benchmark found that 86% of organizations have already experienced an AI-related security incident.

Across the 34 models tested, vulnerability rates ranged from 1.3% to 93%. Models with built-in reasoning capabilities were substantially harder to exploit, recording a 19.9% vulnerability rate compared to 55.1% for models without that capability.

This gap has direct implications for how enterprises approach model selection.

The risk categories carrying the most severity were privacy exploitation, fraud, and cybersecurity threats.

Researchers also flagged what they call the “refuse-but-engage” pattern, where models that initially decline a harmful request but then provide related information that could be put to harmful use anyway. That’s not the kind of behavior a standard launch-day compliance check is designed to catch.

Sinch’s AI Production Paradox research puts numbers to what those vulnerabilities look like once an agent is running in production.

Of organizations that have deployed AI communications agents, 74% have been forced to roll them back or shut them down entirely. PII or customer data exposure is the leading cause, cited by 31% of organizations. Hallucination or brand risk follows at 22%.

It’s important to understand that these are not failures that stay tucked away in internal logs; they play out on customer-facing channels, in live interactions, in real time.

When they do, the technical incident becomes a customer trust event, and those are considerably harder to recover from.

Confidence Doesn’t Protect You

The Sinch data contains a finding that is easy to skim past but deserves attention.

90% of enterprise decision-makers describe themselves as confident in their AI agent readiness.

Among those same organizations, 75% have experienced at least one governance rollback. Confidence, it turns out, has essentially no correlation with governance outcomes.

More striking still, among organizations that describe their guardrails as fully mature, the rollback rate climbs to 81%, above the 74% average.

Sinch believes that this issue relates to the fact that more mature governance programs carry better instrumentation, and better instrumentation means they can detect failures that less mature organizations simply never see.

The companies reporting zero governance failures may not be running cleaner AI deployments. They may be operating with less visibility into what’s actually happening.

TELUS Digital reaches a similar conclusion from a testing angle. The benchmark argues against point-in-time safety evaluations, making a direct case for continuous, automated testing built into developer workflows rather than periodic spot-checks.

In a healthcare deployment cited in the research, that approach cut testing time by 97% while achieving 99.6% accuracy in vulnerability identification.

As Kinsella puts it:

“Enterprises need to move from spot-checking GenAI solutions at launch to testing on an ongoing basis, or they’re leaving vulnerabilities exposed that represent risk that could be avoided.”

From a practical perspective, the question for any leadership team isn’t ‘Are we confident in our AI readiness?’ It’s whether they would know about a failure before their customers did, and how fast.

A Governance Framework Built for the Wrong Problem

If Sinch and TELUS Digital document what is happening in production, Gartner’s analysis offers a structural explanation for why the governance frameworks in place aren’t catching it.

By 2027, Gartner forecasts that 40% of enterprises will demote or decommission autonomous AI agents due to governance gaps identified only after production incidents occur.

The root cause, in Gartner’s assessment, is governance that gets applied uniformly across agents with fundamentally different risk profiles.

Shiva Varma, Senior Director Analyst at Gartner, described the core failure, arguing that “enterprises are treating AI agent governance as binary, either locked down or fully trusted, and that is the root cause of failure.

“Agents operate at different autonomy levels and across different trust boundaries. When the same controls are applied indiscriminately, organizations encounter two common failure modes.”

Those failure modes run in opposite directions. Over-restricting simpler, lower-risk agents slows delivery and pushes teams toward ungoverned workarounds. Under-restricting agents with real operational autonomy raises security, compliance, and business risk precisely where the stakes are highest.

Gartner’s proposed framework classifies agents across four autonomy levels:

  • Observe (read-only access)
  • Advise (generates recommendations, humans execute)
  • Act with Approval (executes only after explicit sign-off)
  • Act Autonomously (independent execution within guardrails)

Each level carries proportionate governance requirements. A reporting agent that reads data and surfaces summaries carries a fundamentally different risk profile to an AI system making real-time decisions on a live customer call, and the controls around each should reflect that.

The link to Sinch’s research is fairly transparent. The 16% of rollbacks that can’t be fully diagnosed, because no audit trail exists, represent, in Gartner’s terms, a governance design failure rather than a monitoring gap.

When an agent’s level of autonomy and its oversight framework aren’t matched from the outset, the ability to reconstruct what went wrong disappears with it.

Getting Ahead of It

The practical thread through all three studies runs in the same direction, even if the starting points differ.

TELUS Digital’s benchmark makes a strong case that deployment-time safety checks are no longer adequate.

Model behavior changes with updates, adversarial techniques evolve, and static evaluations miss the patterns that only surface in real-world interaction flows.

The healthcare case study in the research – which shows a 97% reduction in testing time, 99.6% accuracy in vulnerability identification – makes the argument that continuous automated testing is operationally viable, not just theoretically preferable.

Gartner’s autonomy framework doesn’t ask organizations to rebuild their governance architecture from the ground up. What it does ask is that the controls applied to an agent reflect the level of independence that agent has been granted.

An agent that reads and reports is not the same governance problem as one executing actions in a live customer interaction, and treating them identically is where the failure modes Varma describes tend to originate.

Sinch’s data on the 81% rollback rate among fully mature programs is probably the most important finding of the three to internalize correctly.

The temptation is to read it as a warning about the limits of governance investment. The more accurate reading is that visibility and protection are different capabilities, and that most organizations have been building confidence rather than infrastructure.

The $735 Gap

The ratio TELUS Digital puts on the table is a useful frame for this entire discussion. One dollar in security for every $735 spent on capability reflects a broader set of assumptions about where AI risk actually lives and at what point it becomes real.

The research published this week across three independent organizations suggests those assumptions are badly calibrated.

The vulnerabilities are already present in the models being deployed. The governance frameworks being applied are structurally mismatched to the agents they’re supposed to cover.

And the failures, when they arrive, are arriving in customer interactions rather than in controlled test environments.

For CX leaders with live AI deployments, the practical question is if something went wrong in a customer interaction right now, how long before you would know about it?

On the current evidence, most organizations don’t have a confident answer to that… they probably should.

AI Governance ToolsArtificial IntelligenceAutomationCloud SecurityCybersecurity for CXSecurity and Compliance
Featured

Share This Post