Are Voice Notes the Latest Weapon in the Deepfake Arsenal?

Bad actors are using GenAI-powered audio technology to exploit unsuspecting customers.

Woman, voice note and cafe with smartphone, remote work and conversation for job, online and internet. Discussion, excited and explaining for freelance career, copywriter and working on social

Contact Center & OmnichannelCustomer Analytics & Intelligence Interview

Published: July 2, 2024

Rhys Fisher

Customers could be at risk from deepfaked voice notes, studies suggest.

With voice notes becoming more popular as a form of communication between friends, family, and colleagues, the threat of deepfakes and cybercrimes is increasing.

A 2024 survey conducted by Preply found that two-thirds of American adults have sent a voice note, while 41 percent admit to witnessing an uptick in voice note usage in recent years.

The popularity of voice notes coincides with a significant rise in global deepfakes, with a recent study revealing that AI-powered audio and video counterfeits have increased by 245 percent year-on-year in 2024.

Image showing global deepfake statistics for 2024. — Source: Sumsub Deepfake Growth in the 2024 Election Year

The countries most affected by deepfakes in Q1 of 2024 include China, Germany, the US, and the UK. And while the UK may have seen a 10 percent drop in actual fraudulent cases, it is not for lack of trying, with the number of detectable deepfakes still having risen by 45 percent.

While the data does not differentiate between the number of audio and video deepfakes, Aaron Painter, CEO of security solutions provider Nametag, believes the use of fraudulent audio in cyber-attacks is growing:

“Taking over someone’s account lets you control whatever that account has access to. For an employee account, this means planting ransomware or accessing sensitive company data.

For a customer account, you could hijack social media or bank accounts. Deepfakes make it easier. They are superpowers for bad actors.

But how exactly are these bad actors accessing and exploiting customer voice recordings?

Audio Deepfakes

Deepfake voice technology uses AI to clone a person’s voice, a feat that has become simpler and more accurate thanks to advancements in generative AI (GenAI).

With the surge of homemade videos on social media platforms like Instagram, Facebook, and TikTok, fraudsters can easily find and exploit samples of people’s voices through simple online searches.

It is the proliferation and rapid sophistication of AI tools that makes deepfake voice notes an easy and obvious choice for cyber fraudsters, according to Roman Zrazhevskiy, CEO of MIRA Safety.

Zrazhevskiy believes that bad actors will use this technology to trick people into revealing account passwords, credit cards, and/or banking information, as he explains:

“More advanced criminals will go deeper, though, likely trying to impersonate those within your circle for an added layer of trust and urgency. Often, these schemes look to extort money or financial information.

Though we’ll also likely see spikes in malware attacks, likely driven by victims prompted by their voice notes to download some app they thought a friend recommended via voice message and followed up with a direct link.

Indeed, the ability of GenAI to mimic voices and manipulate unsuspecting people has even been discussed by its most famous exponent, OpenAI.

In an unauthored blog posted on its website in April, the company advised organizations to move away from voice authentication due to OpenAI’s new Voice Engine solution, a tool that can clone voices to sound nearly identical to the original speaker.

Although currently only available in preview, OpenAI has delayed its full release to strengthen societal defenses against realistic generative models.

The organization also urged businesses to develop methods for tracing the origin of audiovisual content and championing the need to educate the public on identifying deceptive AI content.

Audio Deepfakes in the Contact Center

OpenAI’s point on learning to identify deceptive AI content is a classic case of easier said than done. Thanks to the capabilities of its own technology, fraudsters are able to emulate voices to an alarmingly accurate level.

Moreover, these bad actors are deploying a range of different tactics to attempt to dupe people.

Some of these tactics were unpacked in a recent piece of analysis from audio traffic monitoring experts, Pindrop, who explored how audio deepfakes were being used to target contact centers.

By examining a selection of faked call, Pindrop was able to group audio frauds into the following four patterns:

Navigate IVR systems and steal basic account details
Bypass IVR authentication
Alter account details
Mimic IVRs

The audio security firm also provided advice on how contact centers could protect themselves against this growing threat.

You can find out more about this, as well as further details on how bad actors are targeting contact centers by checking out the full article here.

Artificial Intelligence CCaaS Generative AI