XLS-R Deep Learning Model for Multilingual ASR on Low- Resource Languages: Indonesian, Javanese, and Sundanese

Arisaputra, Panji, Handoyo, Alif Tri, Zahra, Amalia

arXiv.org Artificial Intelligence 

ASR is a technological innovation that automatically converts verbal translations into written texts. It focuses on reducing Word Error Rate (WER) metrics when reproducing oral input. ASR's core capability is to act as an optimal connector for information exchange between human-to-human and human-to-machine entities [1]. It has become increasingly important in various domains, including air traffic control, biometric security, games, closed text for YouTube, voice message transcription, and home automation. ASR's implementation in digital media resources is not a new phenomenon, but its complexity has increased [2]. This study focuses on the rapid development of information and communication technology in Indonesia. In Figure 1, the data from the Central Statistics Agency (Badan Pusat Statistik (BPS)) [3] shows that 62.10% and 82.07% of Indonesians have access to the internet in 2021, followed by an increase in mobile phone use of 65.87%. However, less mobile technology is being abandoned, such as computers and cable phones, which are only 18.24% and 1.36%, respectively. The conclusion is that Indonesians are shifting from traditional technology to more mobile and agile devices like smartphones, which require the right modalities for effective and efficient operation.