Despite the improvements and enhancements made in recent times, AI is still unpredictable and unreliable.
Across the customer service and experience space, the technology is being used to answer questions, guide purchases, and increasingly handle conversations once reserved for human agents.
Yet a familiar problem keeps surfacing: these systems still behave unpredictably.
They can be fluent and insightful one moment, then miss something basic the next. And when that happens, the customer experience suffers.
Salesforce’s latest releases take direct aim at this reliability gap.
Across service, commerce, and AI research, the company is seemingly placing accuracy, consistency, and predictable behavior at the center of its AI system design.
And the CRM powerhouse isn’t tiptoeing around the issue.
This is evident with the launch of eVerse – the vendor’s new simulation environment built by its AI Research department to uncover failures before they ever reach a live customer.
In discussing the motivation behind eVerse, Salesforce pointed to what it refers to as “Jagged Intelligence”, instances where AI excels at complex tasks but struggles with simple ones.
The vendor believes that this phenomenon “creates unacceptable business risk” and that eVerse “directly addresses” the issue.
A Reliability Problem Hiding in Plain Sight
Through its Agentforce platform, vendors like Salesforce have been at the forefront of expanding the capabilities and usage of AI agents in the customer service and experience space.
These tools are moving beyond basic interactions to acting autonomously in key moments of the customer journey – highlighting just how prevalent AI now is within the sector.
As Kishan Chetan, EVP and GM of Salesforce Service Cloud, puts it:
“AI agents go beyond predictions and automation; they can understand context, take action, make decisions, and adapt in real time.”
Indeed, Salesforce’s 2025 State of Service report claims that 30% of service cases are currently handled by AI, with this percentage expected to rise to 50% by 2027.
Given the sometimes volatile nature of AI, this shift has the potential to drastically increase instances of unpredictability.
In an environment where one incorrect decision or confused response can alienate a customer and damage a brand’s reputation, there is plenty of risk accompanying the reward of further automation.
Salesforce appears to be attempting to combat the unreliability of AI so that the technology can become safer while its popularity continues to soar.
Breaking AI Before Customers Do
In a nutshell, eVerse is designed to break AI before customers do.
The training environment is unforgiving. It throws noise, accents, unpredictable behavior, and messy real-world scenarios at AI models to see how they cope.
Salesforce AI COO Madhav Thattai explained how Agentforce Voice was one of the eVerse guineapigs:
“With eVerse, we were able to explore many nuances of human conversation before Agentforce Voice reached production.”
“This type of rigor is what turns breakthrough research into scalable products and dependable customer experiences.
“It’s how we’re expanding that same level of responsiveness and consistency across the full observability stack to solve our customers’ most complex needs.”
UCSF Health is one of the companies currently involved in the pilot phase of eVerse.
Collaborating with clinical experts, the Salesforce team is training AI agents to handle healthcare billing, where precision is essential.
Early tests showed agents reaching up to 88% coverage across routine and complex tasks, with reinforcement learning and clinical oversight shaping the results.
Sara Murray, MD, MAS, VP & Chief Health AI Officer at UCSF Health, summarized the goal:
“When used responsibly, we believe AI can help our teams simplify one of the most complex parts of healthcare.”
Salesforce believes that the continuous trial-and-error training loop “transforms agents from generic language models into enterprise-specialized systems ready for production deployment.”
Retail Has the Same Problem – With a Commercial Twist
Salesforce’s reliability push stretches into commerce too, where a new set of challenges is emerging.
Traffic from consumer AI channels, including upcoming integrations with ChatGPT, is rising sharply. Salesforce forecasts that during Cyber Week, intelligent agents will influence 22% of global orders.
This represents another potential risk if the AI representing your brand gives inconsistent answers.
To combat this, Salesforce has announced fresh capabilities for Agentforce Commerce.
The new features allow retailers to syndicate product data into AI channels with tight control over accuracy, pricing, and brand voice.
Self-described as the “first platform designed to give retailers full control over the entire agentic shopping journey,” Agentforce Commerce lets brands push product catalogues into AI platforms like ChatGPT while delivering personalised, agent-led journeys on their own sites to drive loyalty and higher lifetime value.
One organization currently using the tool is Pandora. The company’s Chief Digital & Technology Officer, David Walmsley, discussed the importance of being able to deliver dependable AI, stating:
“This is more than just commerce; it’s about using trusted AI and unified data to guide every customer, making them feel understood, supported, and valued.”
It is clear from Walmsley’s comments that when AI is steering purchasing decisions, that trust has the potential to become a direct revenue lever.
Fixing the Problem Long Before It Shows Up
Viewed together, Salesforce’s recent updates follow a clear logic: push reliability upstream.
The vendor is building it into the training data, the simulation environment, the risk checks, the decision-making logic, and the human oversight loops.
The logic appears to be: if reliability becomes an afterthought, AI remains a gamble; if it becomes structural, the technology scales far faster.
Salesforce’s State of Service report hints at the same dynamic. AI is already reshaping frontline roles. But adoption still hinges on trust. Service teams want tools they can rely on, not systems that feel experimental.
Security leaders echo this. While 51% report delayed rollouts due to risk concerns, Salesforce’s State of IT: Security report finds unanimous agreement that AI agents can strengthen security posture – once reliability and safety controls are in place.
In other words, the enthusiasm is there, but the confidence is catching up.
Maybe reliability will prove to be the missing piece.
A Line in the Sand for AI in CX
As AI agents step further into direct customer interactions, the industry is hitting a turning point.
These systems are no longer side tools; they’re the first point of contact in many journeys. That places reliability above convenience, above cost savings, and above speed.
Silvio Savarese, Chief AI Scientist at Salesforce, linked the research to real-world outcomes:
“Our partnership with UCSF Health demonstrates how applied science translates directly to customer value, proving that when you train agents in environments that mirror real-world complexity, they perform reliably when it matters most.”
That’s the crux of Salesforce’s message. In the agentic era, reliability must be foundational; not an afterthought.
Teams that recognize this early – and harden their AI accordingly – will be the ones who are best placed to fully maximize the potential of the technology.