Facebook’s new speech recognition tech eliminates the need of human transcription

Every big tech company nowadays refer to Artificial intelligence (AI) as the future of technology and are working towards it, the tech giant ‘Facebook’ has also put in a great amount of effort towards AI tech. Speech recognition is an important area in artificial intelligence technology that we use on a daily basis in our phones, cars and smart speakers etc., speech recognition is still a work in progress that many tech giants are trying to perfect.

Facebook is indicating a major breakthrough in speech recognition and learning, the company says that they have built a method of speech recognition that does not rely upon text to speech translation for speech recognition.

Previously using conventional systems, humans had to transcribe each set of data and that to for every language whereas now this new speech recognition system no longer requires all that. Facebook’s new unsupervised system can learn directly from human speech audio giving the system a much better sense of how human to human conversation actually sounds like while also saving a whole lot of time as transcribing each data set requires hours and hours of work.

Facebook’s new speech recognition model is built upon a feedback loop between a generative adversarial network (GAN) composed of a generator and a discriminator, a generator generates new data instances while a discriminator evaluates it for authenticity. In Facebook’s new speech recognition model the generator spits out speech patterns that are complete gibberish until they are put through the discriminator, at the same time Facebook also inputs additional text written by humans to help the generator to understand the difference between computerized and real results, this process is repeated until the output matches real text.

The tech giant has named its model ‘Wav2vec-U’ (U stands for unsupervised) and have started testing it. The model was tested on Swahili language which is spoken in the central Asian republic of Kyrgyzstan. The test results showed that the system delivered 63 percent less errors than the next best unsupervised method, right now the unsupervised model is as accurate as the supervised model was a few years back, which suggest that this model will replace the current supervised model in a few years. To accelerate development Facebook has made the code for Wav2vec-U available on GitHub.

To enable speech recognition technology for many more languages, Facebook AI is releasing wav2vec Unsupervised, a new method to train models with no supervision whatsoever. It rivals the performance of the best supervised systems from just a few years ago. https://t.co/b6ic50AsM6 pic.twitter.com/T9tP3SNjjO
— Facebook AI (@facebookai) May 21, 2021

The new era of speech recognition is on its way and will prove to be valuable for Facebook as it would help democratize technology and help Facebook achieve its goal of connecting billions of people through the language they prefer.

Facebook’s new speech recognition tech eliminates the need of human transcription

Arooj Ahmed

You might like