We Tried New AI Lip Reading Tool—Here’s How it Fared

Audio tech startup Symphonic Labs has launched an internet instrument showcasing how their AI performs at lip studying. We put it to the check.

The San Francisco and Canada-based startup creates what it calls instruments for “multimodal speech understanding” with purposes together with voice calls in extremely noisy environments or whispering to your voice assistant in public.

“Wanna know what individuals like Blake Energetic, Taylor Swift, LeBron James, and extra are saying when the microphones aren’t round? We simply launched readtheirlips.com, permitting you to add a video of any speaker and establish inaudible speech utilizing our AI Lip Studying mannequin,” the startup posted on LinkedIn.

Anybody can add a brief video clip to the location and it’ll return textual content of what it calculates is being mentioned. The video should clearly present the face and lips of the speaker.

We examined Symphonic Lab’s lip studying AI on a 26 second Getty Pictures clip of U.S. VP Kamala Harris talking at an occasion on Gun Violence Consciousness Day at Kentland Neighborhood Middle on June 7, 2024 in Landover, Maryland.

For essentially the most half, the software program was fairly correct but it surely acquired some minor components of the speech incorrect e.g. “to strive to consolation them” as a substitute of “to strive and consolation them,” and a few average errors too: “will recall daily in gun violence” as a substitute of “or what we name on a regular basis gun violence.” Total, so long as the face was clear, it appeared fairly correct.

Split image of Kamala Harris, Taylor Swift.
VP Kamala Harris through the presidential debate and Taylor Swift, who has endorsed the Democratic presidential candidate. Symphonic Labs has examined its lip studying AI on Taylor Swift whereas Newsweek examined it on footage of…


SAUL LOEB and ANDRE DIAS NOBRE/AFP by way of Getty Pictures

We additionally examined it on some silent movie period clips to see the way it fared with grainy previous black and white footage. Whereas we won’t verify what was truly being mentioned, it was fascinating to see what film stars like Gloria Swanson may need been saying.

In a 23-second information reel clip from 1925, Swanson can been seen on a ship in New York Harbor with the Statue of Liberty within the background. The clip is silent and voiced over by a newsreader. Symphonic Lab’s software program guesses the actor is popping to her husband and saying one thing approximating “I have been doing this for a very long time, I have been doing it for thus lengthy,” as she waves to the digicam.

Readtheirlips.com is a showcase of what Symphonic Labs is totally engaged on. Its Mac OS software program software referred to as MAMO integrates this know-how with private computer systems, permitting the consumer to situation voice instructions “with out making a sound,” Chris Samra, engineer with the startup, posted on X (previously Twitter).

Talking to Newsweek, Samra mentioned that the rationale he and his co-founder created the startup was “to construct an interface that felt telepathic, with out the necessity for an implant or cumbersome {hardware}.”

“When it comes to novelty, our AI mannequin serves two functions. On the one hand it may let anybody talk 3x sooner than typing with out making a sound, and then again it has the flexibility to investigate speech at lengthy distances or in loud environments,” added Samra.

He defined that readtheirlips.com is extra of a tech demo and “not our major objective within the long-term,” though “it is superb to see individuals attempt to decode inaudible movies from the previous that in any other case could not have been decoded with out our mannequin.”

“I actually assume the massive alternatives are in enabling the mass shopper to make use of conversational interfaces with a lot much less friction, and accessibility for individuals with Dysphonia, RSI, and people who are exhausting of listening to,” mentioned Samra.

A brand new replace to this software program now permits for the addition of non-public context and vocabulary, which means the consumer can prepare it higher to work with their voice and different interactions.

“You possibly can dictate in public and noisy environments and it’ll transcribe for you by studying your lips. No vocalization, further {hardware}, or wearable mic required,” added Samra.

This may show helpful for a lot of. A PwC survey on how U.S. shoppers work together with voice assistants discovered that most individuals really feel uncomfortable utilizing it in public.

“Regardless of being accessible in all places, three out of each 4 shoppers (74 p.c) are utilizing their cell voice assistants at dwelling. The vast majority of focus group individuals had been fast to say that they like privateness when talking to their voice assistant and that utilizing it in public ‘simply seems to be bizarre’,” mentioned the report.

Sensi Tech Hub
Logo