Collaborating Authors

AI model from OpenAI automatically recognizes speech and translates it to English


On Wednesday, OpenAI released a new open source AI model called Whisper that recognizes and translates audio at a level that approaches human recognition ability. It can transcribe interviews, podcasts, conversations, and more. OpenAI trained Whisper on 680,000 hours of audio data and matching transcripts in 98 languages collected from the web. According to OpenAI, this open-collection approach has led to "improved robustness to accents, background noise, and technical language." It can also detect the spoken language and translate it to English.

Introducing Whisper


We've trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition. Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing. The Whisper architecture is a simple end-to-end approach, implemented as an encoder-decoder Transformer.

How will OpenAI's Whisper model impact AI applications?


Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Last week, OpenAI released Whisper, an open-source deep learning model for speech recognition. Developers and researchers who have experimented with Whisper are also impressed with what the model can do. However, what is perhaps equally important is what Whisper's release tells us about the shifting culture in artificial intelligence (AI) research and the kind of applications we can expect in the future.

OpenAI open-sources Whisper, a multilingual speech recognition system


Speech recognition remains a challenging problem in AI and machine learning. In a step toward solving it, OpenAI today open-sourced Whisper, an automatic speech recognition system that the company claims enables "robust" transcription in multiple languages as well as translation from those languages into English. Countless organizations have developed highly capable speech recognition systems, which sit at the core of software and services from tech giants like Google, Amazon and Meta. But what makes Whisper different, according to OpenAI, is that it was trained on 680,000 hours of multilingual and "multitask" data collected from the web, which lead to improved recognition of unique accents, background noise and technical jargon. "The primary intended users of [the Whisper] models are AI researchers studying robustness, generalization, capabilities, biases and constraints of the current model. However, Whisper is also potentially quite useful as an automatic speech recognition solution for developers, especially for English speech recognition," OpenAI wrote in the GitHub repo for Whisper, from where several versions of the system can be downloaded.