Microsoft Boosts Contact Center Voice AI with a New Take on Speech Recognition

The new Constrained Speech Recognition capability aims to bring more clarity to customer conversations

Published: September 15, 2025

Nicole Willing

Microsoft has added a new feature to its Dynamics 365 Contact Center platform, Constrained Speech Recognition.

The innovation introduces structured rules to increase the accuracy of voice inputs.

As more contact centers apply AI tooling to the voice channel, such as conversational analytics, agent assistance, and automation, speech recognition engines are playing an increasingly crucial role in supporting the success of these implementations.

However, traditional voice recognition systems can struggle to accurately understand what customers say, because they are designed to interpret a wide range of possible words without focusing on the specific context and intent of the conversation.

Human agents naturally use contextual cues, including the subject of the call, related common phrases, and tone of voice to anticipate and understand what the customer is likely to say. They can also account for accents, slang, muffled speech or unexpected wording more easily than an automated system.

Constrained Speech Recognition aims to close the gap. It uses structured rules known as “grammars” to define what the system should recognize, and help narrow down the words and phrases the customer is likely to use to reduce errors.

Grammars typically use the Speech Recognition Grammar Specification (“SRGS”) format, which is an industry standard that can include logic for validation, positional constraints, and checksum verification. This is key in sectors like healthcare, finance, and enterprise IT, where a misheard word or number can disrupt the customer experience.

Additionally, grammars can help voice recognition systems recognize when a user is citing an alphanumeric string like an ID number, confirmation code, or package tracking reference. It can also help identify items from a specific list.

Ultimately, this gives systems the context to recognize expected inputs, improve accuracy, and reduce error rates, particularly in noisy environments where the customer’s voice may be hard to detect.

Sam Bobo, Senior Product Manager at Microsoft, wrote a blog post to celebrate the news, stating:

As voice systems continue to evolve into agentic architectures with non-deterministic conversations, constraint will play a critical role in ensuring specific outputs remain accurate, secure, and user-friendly.

While the move may not seem like a massive leap forward, contact centers have reported problems with their voice AI accurately interpreting alphanumeric strings, with phone numbers and addresses being common examples. That can result in agents regularly making manual corrections, implementing workarounds that add to their workload.

Indeed, in a recent paper on customer service reps’ perception of AI agent-assist software, researchers found that the technology can be hamstrung by such transcription errors. The study also found this was especially the case when customers switched between languages, accents, and dialects.

Other studies have shown that poor performance of agent-assist tools can cause employees to resist using them. That’s a common problem in contact centers. In 2023, Gartner found that 45 percent of agents resist adopting new technology altogether.

Now, that’s mostly a change management issue. However, the tech can sometimes be a problem. Thankfully, more advanced capabilities like Constrained Speech Recognition can go some way to alleviating their concerns and reducing customer frustration.

Agent Assist AI Agents Natural Language Understanding

Brands mentioned in this article.

Microsoft