Lost in Translation: Why Your Chatbot Might Be Misleading Customers

Why terms like 'likely' or 'possible' could be creating a gap in customer expectations

3
Chatbot Language Study
AI & Automation in CXNews

Published: February 25, 2026

Rob Wilkinson

Imagine a customer asks your chatbot if their refund will arrive by Friday. The bot calculates the odds and replies that it is ‘likely’.

To the bot, ‘likely’ might mean an 80% statistical chance. But to the anxious customer, ‘likely’ often sounds like a soft ‘yes’. If the refund does not arrive, the customer does not feel the bot made a statistical error. They feel misled.

This scenario highlights a subtle risk in chatbot language. It is not about hallucinations or wrong answers. It is about estimative uncertainty.

New research suggests that Large Language Models (LLMs) and humans interpret words of probability very differently. For CX leaders, this ‘translation gap’ represents a potential friction point that could be eroding trust.

The Mathematics Of Misunderstanding

The core of the issue lies in how we assign numbers to words. A recent study titled ‘An evaluation of estimative uncertainty in large language models’ explored this dynamic. The researchers compared how LLMs interpret probability words against human benchmarks.

The results highlighted a distinct mismatch.

When a human hears the word ‘likely’, they might internally calibrate that to a 65% chance. But the study suggests that LLMs can assign a significantly higher probability to the same term, often pushing above 80%.

This gap might seem small mathematically. In a customer service context, however, it is potentially massive. It could be the difference between managing expectations and setting a customer up for disappointment.

If your automated agent uses confident language to describe uncertain outcomes, it risks overpromising. The bot isn’t lying. It is simply speaking a different statistical dialect than your customer.

Mayank Kejriwal, Research Associate Professor at University of Southern Carolina summarizes the research in an article on Fortune:

“An AI model might use the word ‘likely’ to represent an 80% probability, whereas a human reader typically interprets it as closer to 65%.”

Context Changes Everything

The risk becomes more complex when you factor in context. The study indicates that LLMs are highly sensitive to how a prompt is phrased.

Changing the language of the prompt or the framing of the question can shift the bot’s probability estimation. A bot might interpret ‘likely’ differently in a financial context versus a casual conversation.

This variability makes it difficult for conversation designers to guarantee a consistent experience. A human agent knows that telling a banking client “funds will ‘likely’ clear” carries more weight than telling a shopper “this shirt will ‘likely’ fit.”

An LLM may not intuitively grasp that emotional weight without strict guardrails.

The ‘Safe Word’ List For CX

This research does not mean we should pull the plug on generative AI. It means we need to be more deliberate about the vocabulary we allow our agents to use.

CX leaders should consider auditing their system prompts for ‘fuzzy’ language.

Avoid unanchored adjectives. Words like ‘maybe’, ‘perhaps’, and ‘likely’ are open to interpretation. They are safe for casual chat but dangerous for transactional promises.

Use data, not vibes. Instead of letting the bot say “delivery is expected soon,” instruct it to provide the specific window. “Delivery is estimated between 2 PM and 4 PM” is far safer.

Visuals over text. Where possible, use visual confidence meters or status bars. A green ‘High Likelihood’ badge is often clearer to a user than a paragraph of text that tries to hedge its bets.

Calibrating For Trust

The goal of AI in CX is to reduce effort and increase satisfaction. Precision is a key part of that equation.

We spend a lot of time worrying about AI getting the facts wrong. We need to spend equal time ensuring it gets the nuance right.

If we ignore this calibration gap, we risk a paradox. We could build bots that are technically accurate but still leave customers feeling misled.

Sources: An evaluation of estimative uncertainty in large language models


Join the conversation: Join our LinkedIn community (40,000+ members): https://www.linkedin.com/groups/1951190/ Get the weekly rundown: Subscribe to our newsletter: http://cxtoday.com/sign-up

Agentic AIAgentic AI in Customer Service​Agentic AI SoftwareAI AgentAI AgentsAutonomous AgentsChatbotsSPOTLIGHT: Protecting Customer Trust in the Age of AI​
Featured

Share This Post