5 Tech Advances That Have Made Voicebots Much More Viable

Discover how recent improvements to voicebot modeling have made contact center voice automation much more appealing

1. Speech-to-Text

In March 2020, Statista published a study that found the average accuracy rate for speech-to-text automated transcription models across various industries was 77 percent.

In other words, the average model could only transcribe 77 words out of 100 accurately.

However, fast-forward three years, and the technology is much more advanced. Indeed, Microsoft and Amazon have reached 95.9 and 95.6 percent accuracy rates, respectively.

Still, a one-in-20-word error rate may sound bad. Yet, it is not bad at all. As Pierce Buckley, CEO & Co-Founder at babelforce, explains:

“Customer conversations will likely involve simple language, not tricky terminology from a science paper or complex book – which the AI will struggle with.”

As such, voicebots for simple use cases – such as gathering customer information upfront before a live agent interaction – now achieve well over an 80 percent success rate, according to Buckley.

In addition, much of that remaining <20 percent will fail because of issues detached from the capabilities of the conversational AI model.

For instance, perhaps an integrated system could not locate the customer ID number, or the customer has no existing record. Such problems are most likely to cause voicebot failures.

2. Text-to-Speech

Think of how we speak, as humans, constantly changing our emphasis – in very subtle ways – to convey tone and often meaning. For a voicebot, that is much more tricky.

Conventionally, voicebot vendors will take one of two approaches to overcome that issue.

First, they may employ a predictive, statistic-based neural network or stochastic model.

The alternative is human-written rules, where developers can use control mechanisms within the voicebot to place emphasis.

Yet, as voicebots advance, vendors are finding a happy medium. Sharing why, Buckley says:

“Imagine writing out a sentence that has two places for emphasis. You want to be able to tell the bot to do that. You don’t want to wait six months until the neural network has enough data for that type of request, so it does it automatically. It should go live on Friday.”

As such, interfaces will allow developers to add points of emphasis – or “prosodic markers”, as linguists would say – so the bot says something in the desired way.

Nevertheless, neural networks will run within the bot, continuously learning and improving – across each language it speaks – so future generations require less manual programming.

3. No-Code Tools

Within 30 minutes of first working on a voicebot, a business can have its first flow up and running while planning an A/B test for the following week.

Such speed to deployment is relatively new, and much of this stems from the development of low-/no-code interfaces.

These interfaces make the experience of building a voicebot similar to playing a video game, as developers navigate drop-down menus, connect dialogs, and select various tasks and actions.

Now, with LLMs, brands are taking this further. For example, Google is using its Bard-powered App Builder to plot conversational flows and tweak the design automatically for IT teams, utilizing natural language prompts alone.

Such innovation is excellent. Yet, brands must remember that the most critical factor in the success of a voicebot is how well it speaks with the various other datasets contact centers leverage.

“Think of it like an iceberg,” says Buckley. “The shiny bit at the top is the AI component. Yet, the 90 percent – which your Titanic is going to bump into – is all data flows.”

As such, brands must also work with vendors who can wrap APIs around various systems and level up the systems architecture, so IT teams can quickly optimize the bot at any time.

4. Natural Language Understanding (NLU)

Until recently, embedding NLU into an operational AI model proved tedious.

Conversational design teams would sit down and consider how many ways a customer could say they have a particular problem.

That takes a long time. For example, an Australian bank found over 2,000 ways a customer could ask: “What’s my balance?”

Someone from a bank would have to plug these into the voicebot manually.

“There are 40 or 50 different ways to simply say “yes” and “no” in a single language,” says Buckley. “The variability is astonishing.”

As such, it took an incredible amount of effort to draw out all the possibilities across even the most transactional of conversations and map those to an “NLU result”.

That process has improved in recent years, with speech analytics systems capable of understanding intent. Yet, these came at a high price point, and sequencing them with multiple AI technologies became another kettle of fish.

Now, with the advent of LLMs, which can detect customer intent without any prior training, that has become much simpler.

Indeed, many voicebot providers – like babelforce – are already augmenting these models with their voicebot solutions.

5. Dialog Management

“There is no LLM or neural network anywhere that is good at dialogue management,” says Buckley.

“You might think, when you use ChatGPT, that it has context, but it doesn’t.

“What the API does in the background is reinjects previous prompts with the new prompt.”

As such, tools like ChatGPT can simulate having context. But, really, they only reference previous parts of the conversation, which causes many of the funny dialogs people post online. Cue an excellent example:

I asked ChatGPT to play a game of Hangman…not sure it understood the rules lol
by u/dinospoon99 in ChatGPT

Nonetheless, it is still very impressive in some ways. For instance, it can remember people’s names and roles from the previous prompt – an astonishing leap forward.

Also, some voicebots now utilize LLMs to detect when a customer’s intent changes midway through a conversation and pull the customer back on track.

Thanks to these new capabilities, dialog management has improved to a degree. Yet, there is still a long way to go.

“Everything to do with the dialog, what pattern it should happen in, and knowledge of what happened previously has to be programmed by humans.”

For this reason, the ability to reconfigure and quickly test any voicebot design – including back-end integrations – is critical (which takes us back to point three…).

Begin Your Voicebot Journey with babelforce

Thanks, in part, to these five tech advances, the growth of conversational AI will likely catch fire over the next three years.

Indeed, while bots only automate 1.6 percent of agent interactions, as of 2022, that figure will rise to ten percent by 2026 – according to Gartner estimations.

That’s quite the uptick, and tech partners like babelforce are helping brands prepare for this future with excellent voice AI and expert support services.

To learn more about how babelforce can assist your business in bringing voicebots into the contact center, cutting costs, and enhancing experiences, visit: www.babelforce.com/solutions/voicebot/

Artificial Intelligence CCaaS Virtual Agent