whisper
Quantization for OpenAI's Whisper Models: A Comparative Analysis
Automated speech recognition (ASR) models have gained prominence for applications such as captioning, speech translation, and live transcription. This paper studies Whisper and two model variants: one optimized for live speech streaming and another for offline transcription. Notably, these models have been found to generate hallucinated content, reducing transcription reliability. Furthermore, larger model variants exhibit increased latency and pose challenges for deployment on resource-constrained devices. This study analyzes the similarities and differences between three Whisper models, qualitatively examining their distinct capabilities. Next, this study quantifies the impact of model quantization on latency and evaluates its viability for edge deployment. Using the open source LibriSpeech dataset, this paper evaluates the word error rate (WER) along with latency analysis of whispercpp using 3 quantization methods (INT4, INT5, INT8). Results show that quantization reduces latency by 19\% and model size by 45\%, while preserving transcription accuracy. These findings provide insights into the optimal use cases of different Whisper models and edge device deployment possibilities. All code, datasets, and implementation details are available in a public GitHub repository: https://github.com/allisonandreyev/WhisperQuantization.git
- North America > United States (0.46)
- Asia > Indonesia (0.14)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.42)
Improving Whisper's Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text
Li, Jinpeng, Pu, Yu, Sun, Qi, Zhang, Wei-Qiang
Whisper and other large-scale automatic speech recognition models have made significant progress in performance. However, their performance on many low-resource languages, such as Kazakh, is not satisfactory. It is worth researching how to utilize low-cost data to improve the performance of Whisper on under-represented languages. In this study, we utilized easily accessible unpaired speech and text data and combined the language model GPT with Whisper on Kazakh. We implemented end of transcript (EOT) judgment modification and hallucination penalty to improve the performance of speech recognition. Further, we employed the decoding average token log probability as a criterion to select samples from unlabeled speech data and used pseudo-labeled data to fine-tune the model to further improve its performance. Ultimately, we achieved more than 10\% absolute WER reduction in multiple experiments, and the whole process has the potential to be generalized to other under-represented languages.
Indigenous groups fear culture distortion as AI learns their languages
When U.S. tech firm OpenAI rolled out Whisper, a speech recognition tool offering audio transcription and translation into English for dozens of languages including Maori, it rang alarm bells for many Indigenous New Zealanders. Whisper, launched in September by the company behind the ChatGPT chatbot, was trained on 680,000 hours of audio from the web, including 1,381 hours of the Maori language. Indigenous tech and culture experts say that while such technologies can help preserve and revive their languages, harvesting their data without consent risks abuse, distorting of Indigenous culture, and depriving minorities of their rights. This could be due to a conflict with your ad-blocking or security software. Please add japantimes.co.jp and piano.io to your list of allowed sites.
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.30)
Artificial Intelligence In Agriculture – Another Place Where Medical Techniques Can Help
As weird as it is to me to know that ranching is part of agriculture, I do find it interesting that the price points on medical technology means that new techniques can move into the industry. Artificial intelligence in medicine has been a big part of its growth. Now AI is also moving to help ranchers to better manage their herds. While much of the focus on AI in the field has been on vision and back-end analysis, the real world has a lot of another sense that matters – sound. In human medicine, of the first and easiest tools to use is the x-ray.
- Health & Medicine (1.00)
- Food & Agriculture > Agriculture (0.66)