These AI glasses can read lips
These wearable AI glasses can read your face and mouth movement to translate unheard speech - the quiet revolution of EchoSpeech Glasses.
PHD student Ruidong Zhang appears to engage in an animated soliloquy, but what seems like a private dialogue is a silent dance with technology. Wearing EchoSpeech glasses, a brainchild of Cornell’s Smart Computer Interfaces for Future Interactions (SciFi) Lab, Zhang silently mouths commands, from unlocking his smartphone to orchestrating his playlist.
The EchoSpeech glasses may look like ordinary eyewear, but they are, in fact, a silent-speech recognition interface powered by acoustic sensing and AI. These glasses, developed by the SciFi Lab, can discreetly decipher up to 31 unvocalized commands based on lip and mouth movements.
Ruidong Zhang’s study, “EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic Sensing,” will be presented at the Association for Computing Machinery Conference on Human Factors in Computing Systems (CHI) in Hamburg, Germany.
The team’s work is vitally important if we want to further unlock the potential of AI, according to Cheng Zhang, assistant professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science and director of the SciFi Lab, “Basic human activities such as eating and drinking remain challenging for AI to track and recognize. This challenge stems from the absence of suitable sensing technologies capable of capturing high-quality behavioral data from users in natural settings (in the wild). These sensing systems must be unobtrusive, energy-efficient, and privacy-conscious.”
Equipped with microphones and speakers smaller than pencil erasers, EchoSpeech glasses function as an AI-powered sonar system. They send and receive soundwaves across the face, sensing mouth movements to interpret commands and speech. The SciFi Lab’s deep learning algorithm, boasting 95% accuracy, analyzes these echo profiles in real-time and can generate text on a smartphone. The system pushes the boundaries of performance and privacy, making it a small, low-power, and privacy-sensitive wearable technology. The interface requires just a few minutes of user training data and can be run on a smartphone.
The system was tested in a user study with 24 people and the wearing experience is, according to the team, no different from wearing normal glasses. “The appearance is largely the same as normal glasses. The only difference is that we installed a few miniature microphones and speakers on the glasses, which are very light. Because we are emitting inaudible sounds, none of the participants were able to hear the sound during their study time,” says Ruidong Zhang.
“EchoSpeech, builds upon our previous work that demonstrated the effectiveness of active-acoustic sensing, similar to “sonar,” on glasses or earphones for tracking facial movements and expressions by detecting subtle skin deformations. When people speak, even silently, their lips move, causing changes in the shape of the surrounding tissue and muscles on the face. We believe that by utilizing these “sonar” sensors to capture the skin deformations while the user is speaking, we can effectively understand their speech intention and content, thus leading to the invention of EchoSpeech,” says Chen Zhang.
Alongside helping visually impaired people, and those with speech difficulties, these incredible glasses could be used to communicate with others in areas or places where speech is forbidden, impractical, or hard to understand, such as in a library, or in a busy public space. Beyond the futuristic allure, EchoSpeech’s truepotential lies in revolutionizing accessibility. For those who cannot vocalize sound, it could serve as an excellent input for a voice synthesizer, potentially giving patients their voices back. In addition to serving those with physical needs, the device can also be paired with a stylus and used as an input device with software such as CAD. Luddites beware, you might be saying goodbye to the mouse and keyboard sooner than expected.
The SciFi Lab has long experimented with a number of wearable devices that track body, hand and facial movements using machine learning and tiny video cameras. However, it’s the shift away from a reliance on traditional video and toward acoustic sensing that’s really making the difference. The new approach offers a range of benefits, from longer battery life to better security and smaller devices.
“EchoSpeech offers a solution that is both cost-effective and power-efficient. It achieves affordability by relying on hardware components that are economical and widely available. Additionally, it presents impressive energy efficiency, capable of operating continuously on smart glasses for over 20 hours,” says Ruidong Zhang. The key to it all is the compact form factor, allowing the user to move freely and to avoid bulky equipment. Part of this reduction in form factor is down to the acoustic-sensing technology which involves less data, meaning that it can be transferred via Bluetooth in real time.
As for the future, SciFi Lab scientists are working on the creation of smart-glass applications that could track facial, eye and upper body movement, “We think glass will be an important personal computing platform to understand human activities in everyday settings,” Cheng Zhang said.
Remarkably, all this performance and promise doesn’t come with an astronomic price tag, “One unique advantage of our system using acoustic sensing is that these microphones and speakers are widely available and very cheap. The cost of the entire hardware prototype is under 50 USD,” says Ruidong Zhang. Aside from the hardware, the software can be run on a commercial smartphone and there’s no need to involve the cloud at all. This means that the system only requires the glasses themselves and a phone. Incredible.