Enterprise-Grade Speech Recognition

Deepgram CEO reflects on how conversational audio interaction’s time has come

Conversational AI

Published: July 14, 2020

Maya Middlemiss

I can remember experimenting with early offerings in speech-to-text software years ago, expensive applications which required hours of practice and training to use — but then, self-defeatingly, also required hours of editing the text afterwards, to turn the copy into anything meaningful or usable. As Deepgram CEO, Scott Stephenson, reflected, it wasn’t until really recently that the accuracy level finally crossed the threshold for the technology to be of real utility, breaking out first in the domestic environment.

“Business decision makers were seeing their kids using Alexa to do their math homework for them, and suddenly saying wait a minute… We need to take another look at this.” So while the tech powering general speech recognition in home devices is not exactly the same as that used in business devices, it powerfully illustrates the potential for new ways of driving user interactions.

Driven by voice: New ways to get things done

Deepgram’s end-to-end deep learning method achieves unprecedented accuracy levels using ‘command and control’ type inputs, which are very different to the conversational type of speech recognition general devices work with. “It’s capturing audio at a much higher sampling rate, and storing it in a lossless format and then sending it over the Internet to be transcribed by dedicated servers, using a speech recognition system that is anticipating command and control to come to it. It’s expecting you to say things about maps and taking you places and addresses and things like that, or give commands like turn on the lights — it’s not expecting you to have a two-way conversation with another human for an hour, that would be out of domain for it.”

It’s this ability to learn from very specific inputs, as well as adapting to individual acoustic environments, which enables their model to exceed 80% accuracy rates within a couple of weeks.

Serving the customer better

For customer service, Stephenson sees many applications for speech recognition in supporting agents in a distributed environment, with a range of compliance and supervision backup — using natural language recognition to preemptively serve up the knowledge base article they need in real-time for example. With customer service demand surging unpredictably in our ‘new normality’, the ability to deliver consistent levels of interaction quality at scale will be key to many verticals.

And removing a large percentage of voice interactions out of the human agents’ workload altogether, is another area where speech recognition like Deepgram will be a real game-changer for customer service — completely automating many routine functions. So, after years of clunky IVR evolution and where frustrated users frequently struggled to break out of the system to talk to a human and get a problem resolved, the sophisticated tools now coming down the line can make dealing with sales and support enquiries on a self-service yet voice-driven way a straightforward and natural experience.

Talking of a bright future

As Stephenson concluded:

“It’s different now. This fully deep learning based way to do it takes a lot of tooling and investment in order to make it work correctly, but Deepgram has done it and ushered in this new way of thinking about these applications”

So from asking our smart speaker to play that song we like, to ordering and supporting high-end goods and services, we’ll soon be seeing the powerful combination of machine learning and contextual data management deliver ever more effective voice-driven interactions, in a growing range of situations and interactions.

Generative AI Interactive Voice Response Virtual Assistant