Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping
Zhang, Kevin, Chkhetiani, Luka, Ramirez, Francis McCann, Khare, Yash, Vanzo, Andrea, Liang, Michael, Martin, Sergio Ramirez, Oexle, Gabriel, Bousbib, Ruben, Peyash, Taufiquzzaman, Nguyen, Michael, Pulliam, Dillon, Donato, Domenic
–arXiv.org Artificial Intelligence
These labels are then used in traditional supervised training schemas. This line of work in turn bifurcates This paper presents Conformer-1, an end-to-end Automatic into two main approaches. The first approach relies on generating Speech Recognition (ASR) model trained on an extensive pseudo-labels using a pre-existing baseline model [1, 6, 7], dataset of 570k hours of speech audio data, 91% of which was while the second approach attempts to source massive amounts acquired from publicly available sources. To achieve this, we of data of ambiguous quality from the public sources and then perform Noisy Student Training [1] after generating pseudolabels filter it down to a subset that is both human labeled and high for the unlabeled public data using a strong Conformer quality [8]. Our work attempts to address the data scarcity issue RNN-T baseline model. The addition of these pseudo-labeled head-on and leverages both data filtering and pseudo-labeling data results in remarkable improvements in relative Word Error to procure high-quality audio and labels at scale. Rate (WER) by 11.5% and 24.3% for our asynchronous and Following the example provided by Whisper [8], we realtime models, respectively. Additionally, the model is more sourced audio speech data from open and fair use sources available robust to background noise owing to the addition of these data.
arXiv.org Artificial Intelligence
Apr-12-2024
- Genre:
- Research Report > New Finding (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (0.68)
- Performance Analysis > Accuracy (0.51)
- Natural Language (1.00)
- Speech > Speech Recognition (1.00)
- Machine Learning
- Information Technology > Artificial Intelligence