This new AI is helping to put faces to voices… And the results are pretty amazing!


From the guy who does the voice-over for movie trailers to the announcers on the subway, our lives are full of faceless voices.

And while most of us are content to build a mental image of these disembodied orators, a group of researchers from MIT has gone a step further by creating an artificial intelligence system that can reconstruct people’s faces just by listening to their voice.

new ai puts faces to voices, new ai puts faces to voices video, new ai puts faces to voices picture, new ai puts faces to voices science
new artificial intelligence puts faces to voices. Picture by / Shutterstock

The application, called Speech2Face, is a deep neural network that was trained to recognize the correlation between voices and facial features by observing millions of YouTube videos of people talking. In doing so, it learned to associate different aspects of the audio waveform with a speaker’s age, gender, and ethnicity, as well as certain cranial features such as the shape of the head and the width of the nose.

When the researchers then fed the system audio recordings of people’s voices, it was able to generate an image of each speaker’s face with reasonable accuracy.

new ai puts faces to voices, new ai puts faces to voices video, new ai puts faces to voices picture, new ai puts faces to voices science
Speech2Face is able to ascertain characteristics such as age, gender, ethnicity, and head shape just from the sound of a person’s voice. MIT CSAIL/IEEE Xplore

Obviously, characteristics like hairstyle, facial hair, and certain other elements of physical appearance are impossible to predict from a person’s voice, so the developers insist that their goal was “not to predict a recognizable image of the exact face, but rather to capture dominant facial traits of the person that are correlated with the input speech.

The researchers say this technology could one day find a range of useful applications, such as generating faces for video calls without the need for cameras.

However, some improvements are clearly still needed, as while the images created by Speech2Face are generally a good match for face type, they often only bear a general resemblance to the speaker. The system is also prone to the occasional error, with roughly 6 percent of the faces it created being of the wrong gender, and some of the wrong ethnicity.

Nevertheless, faceless voices are one step close to becoming a thing of the past, which should have major implications for prank callers at least.


Follow us on FACEBOOK and TWITTER. Share your thoughts in our DISCUSSION FORUMS. Donate through Paypal. Please and thank you

Leave a reply

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.