Put into words
Speech neuroprostheses may offer a way to communicate for people who are unable to speak due to paralysis or disease, but fast, high-performance decoding has not yet been demonstrated. Now, transformative work by researchers at UCSF and Berkeley Engineering shows that more natural speech decoding is possible using the latest advances in artificial intelligence.
Led by UCSF neurosurgeon Edward Chang, the researchers developed an implantable AI-powered device that, for the first time, translates brain signals into synthesized speech and facial expressions. As a result, a woman who lost the ability to speak due to a stroke was able to speak in her own voice and convey emotion using a digital avatar.
Berkeley Engineering graduate students Kaylo Littlejohn, Sean Metzger and Alex Silva were co-lead authors of the study, and Gopala Anumanchipalli, assistant professor of electrical engineering and computer sciences, was a co-author.
“Because people with paralysis can’t speak, we don’t have what they’re trying to say as a ground truth to map to. So we incorporated a machine-learning optimization technique called CTC loss, which allowed us to map brain signals to discrete units, without the need for ‘ground truth’ audio,” said Littlejohn.
“We also were able to personalize the participant’s voice by using a video recording of her making a speech at her wedding from about 20 years ago. We kind of fine-tuned the discrete codes to her voice,” said Anumanchipalli. “Once we had this paired alignment that we had simulated, we used the sequence alignment method, the CTC loss.”
Learn more: Novel brain implant helps paralyzed woman speak using a digital avatar; How artificial intelligence gave a paralyzed woman her voice back (UCSF); A high-performance neuroprosthesis for speech decoding and avatar control (Nature)