PhoWhisper: Automatic Speech Recognition for Vietnamese
Le, Thanh-Thien, Nguyen, Linh The, Nguyen, Dat Quoc
–arXiv.org Artificial Intelligence
We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. Automatic speech recognition (ASR) technology, also referred to as speech-to-text, has experienced significant advancements (Baevski et al., 2020; Barrault et al., 2023; Pratap et al., 2023), expanding its applicability across a wide range of applications. The state-of-the-art ASR model, Whisper (Radford et al., 2023), has become extremely popular, being widely used in both academia and industry.
arXiv.org Artificial Intelligence
Mar-27-2024