Amazon Polly: Transforming Text into Speech

Introducing Amazon Polly: The text-to-speech service

Customer Data Platform

Published: August 17, 2018

Rebekah Carter

In a quest to help companies create more intelligent voice applications, Amazon released their “Polly” text-to-speech service. The solution transforms everyday text in a multitude of languages into lifelike speech so that businesses can create talking applications and speech-enabled products. The Text-to-Speech solution uses state-of-the-art deep learning technology to synthesise sounds that mimic organic human voices.

Dozens of languages are available, including German, French, Turkish, Swedish, Russian, Spanish, and English. What’s more, Polly is available on pay-as-you-go pricing, with a per-character-converted payment strategy.

What to Expect from Amazon Polly

Amazon Polly differs from other text-to-speech services in the fact that it can offer a more natural-sounding selection of voices in both male and female iterations. The fluid pronunciation of text that comes from Polly’s machine learning system means that you can create a virtual voice that appeals to almost any audience. Features include:

Real-time streaming: Deliver lifelike voices for conversational user experiences (UX) in real-time. When text is delivered to the Amazon Polly API, the system instantly returns the audio back to your application
Speech storage: The Amazon Polly system allows for unlimited generated speech replays without additional fees, you can create your speech files in formats like OGG and MP3 and serve them either from the cloud or locally through apps and devices
Control and customise output: Users can adjust their Amazon Polly voices to suit their needs, with supported SSML tags and lexicons that help you to control all aspects of speech, including volume, pronunciation, pitch, rate, speed, and more

What Can Companies Do with Amazon Polly?

The potential use cases for a service like Amazon Polly are potentially limitless. From a telephony perspective, contact centres can engage their customers with more natural sounding voices. For instance, you can cache Polly speech output for your IVR systems with the help of Amazon Connect. You can also use the Polly API to deliver automated real-time information to customers about account data, service status, and more.

Amazon also suggests that Amazon Polly could be ideal for eLearning platforms, and it’s already being used by Duolingo to help support better pronunciation examples for students learning a different language. Amazon Polly both recognises and generates speech in dozens of different languages, making it ideal for a global audience. Amazon even suggests that marketing professionals can use it to add audio to their content strategy. For instance, you could create a podcast entirely through Amazon Polly, or translate blogs into speech for audience members on the go.

Twilio <Say> Supports Amazon Polly

Twilio Twilio announced their support of Amazon Polly on the 6^th of August. Twilio will be using the system to add more than 25 languages and 50 voices to their <Say> service. The integration will help voice developers to access more control over their synthesized speech solutions, including the Pitch, Rate, and Pronunciation of the voices they use to interact with users.

Before now, the built-in text-to-speech service from Twilio <Say> included support for three voices, each of which came with their own selection of supported languages. Now, Twilio can offer even better support to their users through state-of-the-art innovation from Amazon.

Artificial Intelligence Interactive Voice Response Natural Language Understanding User Experience