Voice has been the backbone of customer service for decades, but in many ways, it wasn’t built for today’s products, customers, or expectations.
In an era where smartphones can stream HD video, scan barcodes, and zoom in on the tiniest detail, it’s strange to still expect people to read out serial numbers or try to describe exactly where their coffee machine is leaking.
That’s where a multimodal approach can help bridge the gap between explanation and understanding.
In a nutshell, multimodal combines voice, visuals, and messaging to make it easier for customers and agents to actually get to a fix.
“When you’re supporting physical products, voice alone is like standing next to someone but with your eyes closed,” says Gintautas Miliauskas, CEO and Co-Founder of Mavenoid.
“You can’t see what they’re doing, you can’t point to anything, and they can’t show you what they’re talking about.”
Why Multimodal Matters Now
The aim of multimodal isn’t to ditch voice, far from it. The aim is to make voice part of a richer customer experience toolkit.
Customers should be able to talk, type, snap a quick photo, or open a diagram – all in the same conversation, without feeling like they’ve been shunted into a completely different system.
This is especially critical for physical products, where the stakes can be higher, as Miliauskas explains:
“When a customer is working on their device, there’s more risk.
“If we give the wrong instructions, we could cause harm to the user or damage the device. For us, multimodal isn’t optional; it’s essential.”
A real-life example of multimodal in action involves US home ventilation brand Broan-NuTone.
When they switched to Mavenoid’s multimodal platform, service levels jumped from 43% to 80%.
Indeed, the platform helped the company achieve higher first-contact resolution, fewer mistakes, and far less back-and-forth between customers and agents.
Beyond the Obvious Metrics
While the improvements discussed above, such as first-contact resolution, average handle time, and escalation rates, are impressive, multimodal can have an impact beyond these traditional metrics.
The technology delivers on less tangible, but equally important, fronts, including better agent accuracy and higher customer engagement.
“With multimodal, across our customer base, we see engagement above 85%,” Miliauskas says.
“When you send someone a diagram or a warranty form instead of spelling things out, compliance is naturally higher.”
Moreover, Miliauskas details how multimodal can be very effective at removing tiny frictions:
“If someone says ‘serial number SN800Z’ over the phone and it’s misheard, you end up repeating yourself.
“When it’s entered on-screen, that friction disappears.”
When it comes to Broan-NuTone, there were several of these ‘non-metric wins’.
Most notably, the improvements to spare parts identification.
With hundreds of SKUs, customers often bypassed online tools and called straight in, which didn’t always speed things up.
“By automating this process in the voice channel, we could send them a link to the exact spare part,” explains Miliauskas
“They could buy it instantly without waiting for an email or typing a long URL.”
From there, the company expanded into warranty claims, installation guides, and onboarding.
Matching that resolution capacity with human agents alone would have meant hiring an extra eight full-time employees.
Instead, the business was able to improve service while keeping costs down: the CX professional’s dream.
Keeping Implementation Simple
Like anything in life, taking the first step is hard. For many people, their brain immediately races towards the worst possible scenario.
Yet, more often than not, in reality, there was no need to panic.
This is especially true in the CX and customer service space, where businesses and professionals frequently assume that new implementations are painful, drawn-out, and expensive processes.
But multimodal doesn’t have to mean a sprawling IT project.
Miliauskas says Mavenoid’s philosophy is to “use the simplest tech that solves your problem.
“Not everything needs the fanciest AI model. We have ready-made solutions that can be configured in 15 minutes.”
The vendor also offers low-code and no-code options that help non-technical teams make updates as needed.
“We reuse existing content from the digital channel, so you don’t have to create everything twice,” Miliauskas explains.
In addition, connected devices benefit from the ability to tap into the device’s data stream, which sometimes allows users to understand the problem before the customer even calls.
Looking Ahead
You may have already picked up on this, but Miliauskas is a strong advocate for the power of multimodal.
The CEO believes the technology will quickly become the default for physical-product support – and sees Mavenoid’s specialization here as a key strength.
“Most automation platforms are generalists,” he explains.
“We’ve always focused on physical products, and that’s where multimodal really shines.”
“Think of it like a trip to the doctor: you come in with a symptom, they ask the right questions, and then prescribe the right treatment. Mavenoid does the same by deciding whether the answer is voice, multimodal, or a human agent.”
For CX leaders, the takeaway is simple: in a world where speed, accuracy, and low effort define the experience, sticking to a single channel is like trying to fix a modern problem with Victorian tools.
Multimodal isn’t just an upgrade; it could very well become the baseline.
You can find out more about Mavenoid and its full suite of solutions and services by visiting the website today.
You can also read about some of the other ways that Mavenoid is deploying voice automation by reading this article.