The Difference Between Speech & Text Analytics

When assessing contact centre software, one finds that speech and text analytics often go hand in hand

Customer Data Platform Insights

Published: June 7, 2021

Anwesha Roy

When assessing contact centre software, one finds that speech and text analytics often go hand in hand. Genesys, for instance, uses speech and text analytics to process all interaction data. Aspect, too, uses speech and text analytics in tandem. But does this mean that these two concepts are necessarily associated, and one cannot be used without the other? Not necessarily.

There are several differences between speech analytics and text analytics, and understanding these nuances is the first step to making the most of their respective use cases.

Speech Analytics vs. Text Analytics: Definition

You can define speech analytics as a collection of programs and statistical algorithms that help to analyse live or pre-recorded calls and gather structured data from an unstructured conversation. Speech analytics is typically applied to telephonic interactions, although you could technically utilise it for any kind of audio analysis – e.g., analysing a voice snippet that a customer has shared over WhatsApp.

Text analytics, on the other hand, can be defined as a technology that helps to extract meaningful and structured information from written text, by equipping the machine to decode and understand a human-written natural language. Text analytics is widely applied in CX management and improvement – across chat, social media, email, etc.

Can Speech Analytics and Text Analytics Overlap?

For contact centre and CX management use cases, there is a definite overlap between speech analytics and text analytics. That is because all speech is first converted into text using Large Vocabulary Continuous Speech Recognition (LVCSR) or phonetic systems. This means that uttered speech is first transcribed into a series of known words or phrases, or converted to a series of phonetic sounds that are strung together to form words, phrases, or sentences.

Once this text input is ready, it uses text analytics techniques like sentiment analysis, word frequency analysis, text classification, etc. to get meaningful insights out of the data.

This doesn’t mean that text analytics cannot work without speech analytics. You could apply text analytics to any alphanumeric string shared via chat, email, SMS, or social media, or even apply optical character recognition (OCR) to first extract text from a photograph/unstructured document and then use text analytics.

Similarly, you could use speech analytics without applying text analytics – for example, checking for a specific keyword or phrase (registered as a predefined set of sounds) for compliance purposes. In these scenarios, a full transcription of the interaction isn’t created, and text analytics does not come into play.

Understanding Emotion – A Key Difference between Speech Analytics and Text Analytics

Finally, a key point of difference between the two technologies is the way they assess and understand customer emotion. Speech analytics will use various voice parameters that reflect changes in the autonomic and somatic nervous system to detect different emotions, moods, and stress levels. For example, the loudness of voice can be correlated with anger and fast speech could suggest frustration. Text analytics, on the other hand, uses words and phrases with positive/negative connotes, as well as disfluency words to gauge emotion.

Artificial Intelligence Generative AI Natural Language Understanding