Technology 3 min read

New AI Technology can Create Images of People's Faces Based on Voice

Peshkova /

Peshkova /

It’s only now that we’re getting a real sense of what AI technology could mean to our own reality. We use our senses to explore this reality, and here, AI can also become tricky.

AI can fool our sight and hearing senses into believing what they perceive from images and sounds are real.

Deepfake AI can work from just a single picture to build a convincing video, like Samsung’s Realistic Neural Talking Head Models.

Dessa’s RealTalk can recreate a person’s voice using only a couple of seconds of audio.

But a new AI tech from MIT takes that concept to a whole new level.

Speech2Face: Your Voice can Give Away Your Looks to AI

Apparently, there’s a link between human voices and their faces, two things we don’t usually associate.

MIT researchers built a deep neural network that can analyze the way you speak based on a short audio clip to show how you would look like. Although the pictures generated by the AI technology were hazy, it was able to guess the gender, age, and ethnicity of the voice owners correctly.

In the paper Speech2Face: Learning the Face Behind a Voice,” the researchers explained how their system works.

The team tested their neural network model on many faces of known and random people alike. While not exact copies, you can see there’s an eerie likeness between the subjects’ real pictures and their reconstructed faces.

Sample portraits of people reconstructed by MIT's newest AI technology based on their voices
Sample portraits of people that were created by MIT’s newest AI technology based on their voices | MIT

After they designed their Speech2Face model, the MIT team needed a large volume of data to train it, in this case, videos of a lot of people talking. And where else than YouTube they would find this. So they used millions of clips from YouTube featuring over 100,000 different speakers.

“During training, our model learns voice-face correlations that allow it to produce images that capture various physical attributes of the speakers such as age, gender, and ethnicity. This is done in a self-supervised manner, by utilizing the natural co-occurrence of faces and speech in Internet videos, without the need to model attributes explicitly.”

It goes without saying that such AI system would have several applications that range from helping detectives decode crimes to automatically assigning a face to your home voice assistant.

The MIT researchers included in their study a section for ethical considerations as they recognize the sensitivity of facial information and say their method cannot reveal the true identity of a person from their voice.

However, their study itself could have invaded the privacy of people in some ways. We wouldn’t have expected MIT to have reached out to everyone from the 100,000 people whose videos were used in the research.

Nick Sullivan, head of cryptography at Cloudflare, was surprised to find out he was used in the MIT Speech2Face study.

“Since my image and voice were singled out as an example in the Speech2Face paper, rather than just used as a data point in a statistical study, it would have been polite to reach out to inform me or ask for my permission,” he told Slate.

Read More: Researchers Develop An AI-Watermarking Technique To Spot Deepfakes

First AI Web Content Optimization Platform Just for Writers

Found this article interesting?

Let Zayan Guedim know how much you appreciate this article by clicking the heart icon and by sharing this article on social media.

Profile Image

Zayan Guedim

Trilingual poet, investigative journalist, and novelist. Zed loves tackling the big existential questions and all-things quantum.

Comments (0)
Most Recent most recent
share Scroll to top

Link Copied Successfully

Sign in

Sign in to access your personalized homepage, follow authors and topics you love, and clap for stories that matter to you.

Sign in with Google Sign in with Facebook

By using our site you agree to our privacy policy.