Have you ever longed for a technology that could convert silent words into audible speech? Well, you don’t have to wait any longer because researchers from UC Berkeley have come up with an AI model that does exactly the same thing.
One can argue that numerous researchers in the past have attempted such a task by utilizing electromyography (EMG) signals. However, what separates the AI model in question from the previous ones is the fact that the previous models relied on preserving audio from EMG that was captured during the vocalized speech. However, the latest AI model’s objective is to generate audible speech from a silent one.
The UC Berkeley researchers assigned the task of detecting silent speeches to EMG. They gathered the EMG measurements during vocalized speech and silent speech. Basically, the researchers utilized surface EMG i.e. placing electrodes on the skin surface to analyze electrical potentials triggered by nearby muscle activity.
Therefore, the researchers were able to record muscle activity and used that as signals in the speech articulators. For maximum effectiveness, the electrodes were adjusted close to the face and neck.
To make this model work, the researchers compiled the dataset containing EMG signals and time-aligned audio of the same person during silent and vocalized speeches. The said dataset comprises almost 20 hours of facial EMG signals and that too from the same person.
The researchers stated that they relied on a WaveNet decoder to produce audio from predicted speech features.
In order to convert the EMG signals to audible speech, the first step should be to utilize a bidirectional LSTM. Boasting three bidirectional LSTM layers featuring 1024 hidden units, the LSTM model consists of a linear projection to the speech feature dimension.
According to the researchers, the said AI model boasts many useful applications including:
If you are interested in learning more about this AI model, click here to read the full paper published by the researchers.
One can argue that numerous researchers in the past have attempted such a task by utilizing electromyography (EMG) signals. However, what separates the AI model in question from the previous ones is the fact that the previous models relied on preserving audio from EMG that was captured during the vocalized speech. However, the latest AI model’s objective is to generate audible speech from a silent one.
The UC Berkeley researchers assigned the task of detecting silent speeches to EMG. They gathered the EMG measurements during vocalized speech and silent speech. Basically, the researchers utilized surface EMG i.e. placing electrodes on the skin surface to analyze electrical potentials triggered by nearby muscle activity.
Therefore, the researchers were able to record muscle activity and used that as signals in the speech articulators. For maximum effectiveness, the electrodes were adjusted close to the face and neck.
To make this model work, the researchers compiled the dataset containing EMG signals and time-aligned audio of the same person during silent and vocalized speeches. The said dataset comprises almost 20 hours of facial EMG signals and that too from the same person.
The researchers stated that they relied on a WaveNet decoder to produce audio from predicted speech features.
In order to convert the EMG signals to audible speech, the first step should be to utilize a bidirectional LSTM. Boasting three bidirectional LSTM layers featuring 1024 hidden units, the LSTM model consists of a linear projection to the speech feature dimension.
According to the researchers, the said AI model boasts many useful applications including:
- The ability to promote speech-type communication without the production of sound.
- Having the potential to form a device similar to a Bluetooth headset, which would make it possible for people to continue conversing on phone without disturbing the people around them.
- Playing a vital role in settings where silence should be maintained at all costs.
- Supporting communication from people who are unable to produce audible speech.
If you are interested in learning more about this AI model, click here to read the full paper published by the researchers.