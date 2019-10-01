search
Technology 3 min read

IBM's new Neural Speech Synthesis Method Improves TTS Systems

IBM's new neural speech synthesis method enables text-to-speech systems to adapt to any voice just by using a small amount of data from the target speaker.

Profile Image
Rechelle Ann Fuertes Oct 01, 2019 at 10:15 am GMT
Image courtesy of Shutterstock

Image courtesy of Shutterstock

To make text-to-speech (TTS) systems less dependent on large and complex neural network models, IBM researchers developed a new method of neural speech synthesis based on a modular architecture.

The team’s method combines three deep neural networks (DNNs) with intermediate signal processing of the networks’ output to produce high-quality speech. The new TTS architecture is reportedly lightweight and can synthesize HQ speech in real-time.

In their paper in arXiv.org, the IBM researchers described how the network models learn a different aspect of a person’s voice, making it easier to train them efficiently on each component independently.

“Once the base networks are trained, they can be easily adapted to a new speaking style or voice, such as for branding and personalization purposes, even with small amounts of training data,” the team wrote.

New Method of Neural Speech Synthesis

IBM’s new method of neural speech synthesis involves three DNNs: prosody prediction, acoustic feature prediction, and neural vocoder.

The prosody features are learned by the network while being trained, allowing the latter to predict them from textual features being extracted by the front-end synthesis time.

“Prosody is extremely important, not only for helping the speech sound natural and lively,” the IBM researcher noted, “but also to best-represent the specific speaker’s style in the training or adaptation data.”

On the other hand, the acoustic feature prediction provides spectral speech representation at short ten millisecond frames. This is where the actual audio will be generated.

The network learns the acoustic feature at training time for it to predict the acoustic from the phonetic labels and prosody features during speech synthesis.

“The DNN model created represents the voice of the speaker in the training or adaptation data.”

Last but not least is the neural vocoder. This network is responsible for producing the actual speech samples from the acoustic features.

The IBM researchers trained the neural vocoder from the speaker’s natural speech samples together with their corresponding features. Called LPCNet, the IBM team claims to be the first to use the said lightweight, high-quality neural vocoder in a fully commercialized text-to-speech system.

The team wrote:

“The novelty of this vocoder is that it doesn’t try to predict the complex speech signal directly by a DNN. Instead, the DNN only predicts the less-complex glottal tract residual signal and then uses LPC filters to convert it to the final speech signal.”

Once trained, IBM’s DNNs could quickly adapt to any voice just by using a small amount of data from the target speaker.

Results of the team’s listening tests revealed that the three networks were able to maintain both high quality and high similarity to the original speaker. That’s even if voices were from speeches that ran for as little as five minutes.

The team’s work is the basis for the new Watson TTS service that you can try here.

Read More: Google Unveils Translatotron, A Speech-To-Speech Translation System

First AI Web Content Optimization Platform Just for Writers

Found this article interesting?

Let Rechelle Ann Fuertes know how much you appreciate this article by clicking the heart icon and by sharing this article on social media.

Twitter Share Facebook Share Share More
Profile Image

Rechelle Ann Fuertes

Rechelle is an SEO content producer, technical writer, researcher, social media manager, and visual artist. She enjoys traveling and spending time anywhere near the sea with family and friends.

Handpicked

IBM
Science 3 min read

IBM Just Unveiled the World's Smallest Computer                           

Rechelle AnnShare
Image courtesy of Shutterstock
Technology 3 min read

Adobe's Latest AI Tool can Identify Photoshopped Faces             

Zayan GuedimShare
Image courtesy of Shutterstuck
Technology 4 min read

Dusting an old Math Theory to Take Machine Vision to the…...

Zayan GuedimShare
Artist's abstraction of quantum mechanics. | Sakkmesterke | Shutterstock.com
Uncategorized 3 min read

Russians Lead the Quantum Computer Race With 51-Qubit Machine

Zayan GuedimShare
ImageFlow | Shutterstock.com
Technology 4 min read

How Machine Learning Could Spur Economic Hypergrowth                 

Zayan GuedimShare
IBM's Prototype Quantum Processor Lab | Research.ibm.com
Technology 5 min read

IBM Wants to Reliably Transfer Quantum Data Using TASE, Nanowire...

William McKinneyShare
Lockheed Martin Artist Rendering of Directed Energy Weapon taking down a UAV | Lockheedmartin.com
Technology 6 min read

3 Future Uses for REE Holmium                                                               

William McKinneyShare
Peter Johnson, CEO of Kalo (formerly Lystable) | Techcrunch.com
Marketing 3 min read

PayPal Founders Fund "the World's Most Badass Recruitment Engine"...

StephanieShare
The first commercial quantum computer, the IBM Q System One | IBM Research Lab
Technology 2 min read

IBM Unveils the World's First Commercial Quantum Computer       

Rechelle AnnShare
Pixabay
Technology 3 min read

Realtalk: AI Company Creates Flawless Synthetic Voice Tech     

Zayan GuedimShare
Pixabay
Technology 3 min read

How This AI Developed a Baby-Like Sense For Numbers                   

Zayan GuedimShare
Quantum devices aren't as esoteric as this image would imply. In fact, they're quite common. | Dmitriy Rybin | Shutterstock.com
Science 4 min read

Quantum Infrastructure Part 2: How Close are we to Quantum Device...

Zayan GuedimShare
Flo | NexGear Technology
Technology 2 min read

Best Video Editor Uses Deep Learning: Introducing the new FLO App

StephanieShare
Melis | Shutterstock.com
Science 4 min read

Second Spectrum AR Will Change the way we Watch Sports             

William McKinneyShare
Dotshock | Shutterstock.com
Technology 5 min read

5 Changes to the Workplace in the Fourth Industrial Revolution

William McKinneyShare
Researchers have developed a new non-von Neumann AI structure which could revolutionize the energy efficiencies of future supercomputers. | Image By Gorodenkoff | Shutterstock
Technology 3 min read

Non-von Neumann AI is Beginning to Resemble the Human Brain   

Zayan GuedimShare
Comments (0)
Most Recent most recent
You
share Scroll to top

Link Copied Successfully

Sign in

Sign in to access your personalized homepage, follow authors and topics you love, and clap for stories that matter to you.

Sign in with Google Sign in with Facebook

By using our site you agree to our privacy policy.