How speech analytics vocabulary works
Today, companies are increasingly leveraging AI algorithms to uncover customer sentiment and intent. According to the recent ITWeb/CallMiner Speech & Analytics Survey, 65% of companies are using speech or multichannel interaction analytics tools to monitor a wide variety of interactions. The success of speech analytics depends on its accuracy levels, which in turn relies on the vocabulary and utilisation technique in use. In its early years, speech analytics was far from sophisticated and could process audio with only about 50% accuracy. Today, advancements in AI and more intelligent speech analytics vocabulary systems have pushed this number to well over 80%.
Let us understand how this essential function – i.e., vocabulary – works, in more detail.
You can define speech analytics vocabulary as a repository of words, domain-specific terms, key phrases, and phonetics sounds that serve as the reference for speech-to-text translation before running an analytics algorithm on a piece of live or recorded audio. Typically, a speech analytics vocabulary will comprise between 50,000 and 100,000 words and phrases for every language, helping to create as extensive a transcript as possible.
Another associated technology that works in tandem with the vocabulary is speech adaptation. Speech adaptation lets you customise the built-in vocabulary by specifying words or phrases that the analytics engine should be able to recognise more frequently. For every keyword that’s matched with a low accuracy rate, most speech analytics tools offer alternatives. Using speech adaptation, you can specify a result other than these alternatives, thereby adding to the original vocabulary.
There’s also an emerging technology called automated speech adaptation, where the tool uses machine learning to learn from your corrections and update the vocabulary accordingly.
Speech analytics vocabulary comes into play when using a Large Vocabulary Continuous Speech Recognition (LVCSR) system. According to this system, the tool analyses separate sounds (called phonemes), but instead of putting these sounds into syllables and words, it matches the phoneme sets against a vocabulary of thousands of words and phrases. Every word is recognised, and for phonemes with a below-average accuracy rate, the tool will recommend multiple word alternatives.
Using a prebuilt speech analytics vocabulary ensures that the entire audio clip is transcribed, and the results are much easier to comprehend as the words match a pre-built vocabulary.
The alternative to LVCSR or large vocabulary-based speech analytics is the phonetic system. Here, the toll recognises specific sounds and their associated phonemes, putting together syllables and words one unit at a time. Phonetic systems also have a prebuilt vocabulary, but this is a fraction of the size of the LVCSR approach. Its advantage is that it can recognise words outside of the vocabulary – like domain-specific names and terminology – and also requires less computing power. On the downside, there is a higher chance of false positives, and the findings may be difficult for a human to read.
Importantly, both approaches rely on a speech analytics vocabulary, which remains the foundation for accurate speech-to-text matches.