Google Cloud has launched a new model for its Speech-to-Text (STT) API that will help improve accuracy across 23 languages and 61 locales.
Launched in 2017, STT powers voice-controlled apps, generates captions for videos, and processes more than one billion minutes of spoken language each month, among other capabilities.
Google Cloud has now released a new neural sequence-to-sequence model for speech recognition, tested across different use cases, noise environments, acoustic conditions, and vocabularies.
The architecture underlying the new model is based on cutting-edge machine learning techniques which allow users to harness speech training data more efficiently.
Françoise Beaufays, Distinguished Scientist, Google Cloud Speech Team, said in a blog post:
“Enterprises and developers alike will instantly see out-of-box quality improvements when using the STT API, and while you can always tune your model for better performance, the benefits of this new architecture can be felt without any initial tuning necessary.
“With the model’s expanded support for different kinds of voices, noise environments, and acoustic conditions, you can produce more accurate outputs in more contexts, letting you more quickly, easily, and effectively embed voice technologies in applications.”
With the model’s expanded support, users can now speak more naturally and in longer sentences to their smart home devices.
Learn more about the fundamentals of speech recognition in this post: Your Guide to Speech Recognition, its Key Features