Realtalk: AI Company Creates Flawless Synthetic Voice Tech

Fake photos and videos that blur the line between what’s real and what’s not is something that artificial intelligence masters.

Already, AI systems can fake visual reality to a significant extent, but when it comes to replicating human voice, AI stumbles on its words.

It’s hard to replicate the human voice. As we’ve seen (and heard), text-to-speech software produces robotic and unnatural voices.

Faux Rogan, RealTalk

Hashiam Kadhim, Joe Palermo and Rayhane Mama are the three AI engineers at Dessa behind what they call RealTalk, a text-to-speech AI generator.

To showcase their creation, researchers tested it on one of the most recognizable voices in the podcasting sphere: Joe Rogan.

Listening to RealTalk’s fake Rogan voice, podcaster Joe Rogan himself wouldn’t believe his ears.

And here’s faux Rogan talking about his fancy chimp hockey team and the bone broth and elk meat diet that would superpower them.

Can you tell which is the real and which is the faux Rogan? Take this quiz that Dessa researchers set up on a dedicated website and see if you can tell who’s who.

For Alex Krizhevsky, Dessa’s Principal Machine Learning Architect, RealTalk is “one of the coolest, but scariest, things I’ve seen yet in artificial intelligence. Unlike The Singularity, which is this theoretical thing that could happen in 40, 100 years, speech synthesis is soon going to be a reality everywhere.”

Real-world applications of RealTalk are many, such as naturally-sounding voice assistants, customized voice apps, and inclusion solutions to people who rely on text-to-speech devices to communicate like those with Lou Gehrig’s.

Then there’s the flipside.

In an article published in Medium, researchers said they’re aware of the “massive” societal implications of AI speech synthesis technologies.

Although the technical know-how and resources needed to build such advanced text-to-speech generators isn’t available to everybody today, things will change in the near future.

“Not just anyone can go out and do it. But in the next few years (or even sooner), we’ll see the technology advance to the point where only a few seconds of audio are needed to create a life-like replica of anyone’s voice on the planet.”