PhoWhisper: Automatic Speech Recognition for Vietnamese

Le, Thanh-Thien, Nguyen, Linh The, Nguyen, Dat Quoc

Mar-27-2024–arXiv.org Artificial Intelligence

We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. Automatic speech recognition (ASR) technology, also referred to as speech-to-text, has experienced significant advancements (Baevski et al., 2020; Barrault et al., 2023; Pratap et al., 2023), expanding its applicability across a wide range of applications. The state-of-the-art ASR model, Whisper (Radford et al., 2023), has become extremely popular, being widely used in both academia and industry.

baseline, phowhisper, vlsp 2020, (13 more...)

arXiv.org Artificial Intelligence

Mar-27-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Vietnam > Thái Bình Province > Thái Bình (0.05)

Genre:
- Research Report > New Finding (0.35)

Technology:
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found