whisper
Improving Whisper's Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text
Li, Jinpeng, Pu, Yu, Sun, Qi, Zhang, Wei-Qiang
Whisper and other large-scale automatic speech recognition models have made significant progress in performance. However, their performance on many low-resource languages, such as Kazakh, is not satisfactory. It is worth researching how to utilize low-cost data to improve the performance of Whisper on under-represented languages. In this study, we utilized easily accessible unpaired speech and text data and combined the language model GPT with Whisper on Kazakh. We implemented end of transcript (EOT) judgment modification and hallucination penalty to improve the performance of speech recognition. Further, we employed the decoding average token log probability as a criterion to select samples from unlabeled speech data and used pseudo-labeled data to fine-tune the model to further improve its performance. Ultimately, we achieved more than 10\% absolute WER reduction in multiple experiments, and the whole process has the potential to be generalized to other under-represented languages.
Indigenous groups fear culture distortion as AI learns their languages
When U.S. tech firm OpenAI rolled out Whisper, a speech recognition tool offering audio transcription and translation into English for dozens of languages including Maori, it rang alarm bells for many Indigenous New Zealanders. Whisper, launched in September by the company behind the ChatGPT chatbot, was trained on 680,000 hours of audio from the web, including 1,381 hours of the Maori language. Indigenous tech and culture experts say that while such technologies can help preserve and revive their languages, harvesting their data without consent risks abuse, distorting of Indigenous culture, and depriving minorities of their rights. This could be due to a conflict with your ad-blocking or security software. Please add japantimes.co.jp and piano.io to your list of allowed sites.
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.30)