PhoWhisper: Automatic Speech Recognition for Vietnamese

Le, Thanh-Thien, Nguyen, Linh The, Nguyen, Dat Quoc

arXiv.org Artificial Intelligence 

We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. Automatic speech recognition (ASR) technology, also referred to as speech-to-text, has experienced significant advancements (Baevski et al., 2020; Barrault et al., 2023; Pratap et al., 2023), expanding its applicability across a wide range of applications. The state-of-the-art ASR model, Whisper (Radford et al., 2023), has become extremely popular, being widely used in both academia and industry.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found