Universal Automatic Phonetic Transcription into the International Phonetic Alphabet

Taguchi, Chihiro, Sakai, Yusuke, Haghani, Parisa, Chiang, David

Aug-7-2023–arXiv.org Artificial Intelligence

This paper presents a state-of-the-art model for transcribing speech in any language into the International Phonetic Alphabet (IPA). Transcription of spoken languages into IPA is an essential yet time-consuming process in language documentation, and even partially automating this process has the potential to drastically speed up the documentation of endangered languages. Like the previous best speech-to-IPA model (Wav2Vec2Phoneme), our model is based on wav2vec 2.0 and is fine-tuned to predict IPA from audio input. We use training data from seven languages from CommonVoice 11.0, transcribed into IPA semi-automatically. Although this training dataset is much smaller than Wav2Vec2Phoneme's, its higher quality lets our model achieve comparable or better results. Furthermore, we show that the quality of our universal speech-to-IPA models is close to that of human annotators.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Aug-7-2023

arXiv.org PDF

Add feedback

Country:
- Africa > Uganda (0.04)
- North America > United States
  - Indiana > St. Joseph County > Notre Dame (0.04)
- Europe
  - Germany (0.14)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Russia > Volga Federal District
    - Republic of Tatarstan (0.04)
- Asia
  - Russia (0.04)
  - Japan (0.04)
  - Myanmar > Chin State
    - Hakha (0.05)

Genre:
- Research Report > New Finding (0.47)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning (1.00)
  - Speech
    - Speech Recognition (0.48)
    - Acoustic Processing (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found