Focus on Whisper, OpenAI's automatic speech recognition system - Actu IA

Oct-19-2022, 10:56:09 GMT–#artificialintelligence

OpenAI recently released Whisper, a 1.6 billion parameter AI model capable of transcribing and translating speech audio from 97 different languages, showing robust performance on a wide range of automated speech recognition (ASR) tasks. The model trained on 680,000 hours of audio data collected from the web was soon published as open source on GitHub. Whisper uses a transform-encoder-decoder architecture, the input audio is split into 30-second chunks, converted to a log-Mel spectrogram, and then passed through an encoder. Unlike most state-of-the-art ASR models, it has not been fitted to a specific data set, but instead has been trained using weak supervision on a large-scale noisy data set collected from the Internet. Although it did not beat the specialized LibriSpeech performance models, in zero-shot evaluations on a diverse dataset, Whisper proved to be more robust and made 50% fewer errors than those models.

automatic speech recognition system, openai, translation, (8 more...)

#artificialintelligence

Oct-19-2022, 10:56:09 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found