Universal Automatic Phonetic Transcription into the International Phonetic Alphabet
Taguchi, Chihiro, Sakai, Yusuke, Haghani, Parisa, Chiang, David
–arXiv.org Artificial Intelligence
This paper presents a state-of-the-art model for transcribing speech in any language into the International Phonetic Alphabet (IPA). Transcription of spoken languages into IPA is an essential yet time-consuming process in language documentation, and even partially automating this process has the potential to drastically speed up the documentation of endangered languages. Like the previous best speech-to-IPA model (Wav2Vec2Phoneme), our model is based on wav2vec 2.0 and is fine-tuned to predict IPA from audio input. We use training data from seven languages from CommonVoice 11.0, transcribed into IPA semi-automatically. Although this training dataset is much smaller than Wav2Vec2Phoneme's, its higher quality lets our model achieve comparable or better results. Furthermore, we show that the quality of our universal speech-to-IPA models is close to that of human annotators.
arXiv.org Artificial Intelligence
Aug-7-2023
- Genre:
- Research Report > New Finding (0.47)
- Industry:
- Education (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Speech
- Acoustic Processing (0.47)
- Speech Recognition (0.48)
- Information Technology > Artificial Intelligence