First-of-Its-Kind Insurance Targets Costly AI Hallucinations in Customer Service and Beyond

After many serious AI mishaps in customer service, Lloyd’s of London has devised a new plan to support affected brands

3
Conversational AILatest News

Published: May 12, 2025

Floyd March headshot

Floyd March

Lloyd’s of London insurers have introduced a new product to cover losses caused by malfunctioning AI tools.

The product protects companies against legal claims relating to issues such as chatbot errors and hallucinations.

Indeed, if an AI tool harms customers or third parties, the insurance may cover costs such as damages and legal fees.

The policy, developed by Armilla – a specialized AI insurance and assessment solutions provider – and underwritten by multiple Lloyd’s insurers, underscores the growing need for risk management in AI adoption.

Karthik Ramakrishnan, CEO of Armilla, believes this coverage could encourage hesitant businesses to adopt AI tools without fearing catastrophic failures.

According to Ramakrishnan, while some insurers have already included AI-related losses within broader technology policies, these generally come with low payout limits, leaving businesses vulnerable to large-scale AI failures.

“We assess the AI model, get comfortable with its probability of degradation, and then compensate if the models degrade,” he told the Financial Times.

Armilla’s policy provides performance-based coverage, meaning payouts are triggered only if the AI performs significantly below its initial expectations.

For instance, if a chatbot initially provided correct information 95 percent of the time but its accuracy later dropped to 85 percent, the insurance could cover associated losses.

This selective underwriting approach ensures that insurers only cover sufficiently reliable AI systems, limiting exposure to excessively flawed technologies.

Where Have We Seen Things Go Wrong in Customer Service?

There are increasing instances of AI errors, from comical mishaps to more reputationally damaging hallucinations.

Take Virgin Money, for example. A chatbot reprimanded one unsuspecting customer for using the word ‘virgin’ in a customer service query.

In the original exchange, David Burch asked: “I have two ISAs with Virgin Money; how do I merge them?”

In what can only be seen as the AI version of a teenager’s bashful response to such a question, the bank’s chatbot responded: “Please don’t use words like that. I won’t be able to continue our chat if you use this language,” deeming the word “virgin” inappropriate.

On the more serious side, a small claims court ruled that Air Canada should compensate one of its customers who was misled into paying for full-price flight tickets by a contact center chatbot.

Other examples of customer service AI gone wrong include DPD’s chatbot swearing, NYC’s model telling small business owners to break the law, and Cursor’s bot inventing company policies.

How to Avoid Similar Pitfalls

Perhaps it goes without saying, but maintaining an accurate and up-to-date knowledge base is essential to safeguarding the performance of customer service chatbots.

After all, most modern models leverage retrieval-augmented generation (RAG) to interpret customer intent and generate responses dynamically by scouring knowledge base content.

So, if these knowledge bases frequently contain gaps or outdated information, mistakes will occur.

Of course, new models may be able to reason and only respond if confident of the correct answer. Yet, the risk still exists.

Additionally, businesses should take a modular approach to conversation automation to ensure accuracy.

Such an approach considers the brand’s top five or ten contact reasons first, optimizes the knowledge and resolution workflows around these intents, and tests extensively.

In doing so, brands can add guardrails so the bot escalates anything outside those intents to protect against inaccuracies.

Over time, companies can take on new intents and automate more of the contact center.

When testing their bots, contact centers can also use generative AI to stress test.

Indeed, LLMs can create training data to try and get past the agent’s guardrails. The aim here is to break the bot before the customer does.

While all this guidance is key, remember: the biggest risk with AI is trusting it too much.

 

Artificial IntelligenceChatbotsVirtual Agent
Featured

Share This Post