Technology 2 min read

Google Unveils Translatotron, a Speech-to-Speech Translation System

Image Courtesy of Google

Image Courtesy of Google

Last week, Google unveiled its latest end-to-end speech translation model that can convert speech to different languages while retaining the speaker’s voice. Called the Translatotron, Google’s latest tool combines the three separate core components of auto-translation: automatic speech recognition, machine translation, and text-to-speech synthesis.

In a statement, Google AI software engineers Ye Jia and Ron Weiss explained:

“In ‘Direct speech-to-speech translation with a sequence-to-sequence model,’ we propose an experimental new system that is based on a single attentive sequence-to-sequence model for direct speech-to-speech translation without relying on intermediate text representation.

This system avoids dividing the task into separate stages, providing a few advantages over cascaded systems, including faster inference speed, naturally avoiding compounding errors between recognition and translation, making it straightforward to retain the voice of the original speaker after translation, and better handling of words that do not need to be translated.”

How Translatotron Works

According to Google, Translatotron has two primary goals: to eliminate the speech-to-text step during translation and the use of the generic voice. In their paper published in ArXiv, the Google engineers described using a neural network to analyze the original speech spectrograms and use it to generate the spectrograms of the translated language, reproducing the speaker’s voice.

The Google AI team reported that the translation tool also utilizes two separately trained components to perform its function. It has a neural vocoder which converts the output spectrograms to time-domain waveforms and a speaker encoder which maintains the original voice of the speaker in the synthesized translated speech.

The team tested the performance of their translator using the BLEU score, an algorithm that evaluates the quality of machine-translated speech from one natural language to another. The results were still behind the conventional cascade system, but the engineers were satisfied that they were able to demonstrate the feasibility of the end-to-end direct speech-to-speech translation.

“We hope that this work can serve as a starting point for future research on end-to-end speech-to-speech translation systems.”

Read More: Google Docs Machine Translation Grammar Suggestions Now Live

First AI Web Content Optimization Platform Just for Writers

Found this article interesting?

Let Chelle Fuertes know how much you appreciate this article by clicking the heart icon and by sharing this article on social media.

Profile Image

Chelle Fuertes

Chelle is the Product Management Lead at INK. She's an experienced SEO professional as well as UX researcher and designer. She enjoys traveling and spending time anywhere near the sea with her family and friends.

Comments (0)
Most Recent most recent
share Scroll to top

Link Copied Successfully

Sign in

Sign in to access your personalized homepage, follow authors and topics you love, and clap for stories that matter to you.

Sign in with Google Sign in with Facebook

By using our site you agree to our privacy policy.