A transfer learning based approach for pronunciation scoring

Sancinetti, Marcelo, Vidal, Jazmin, Bonomi, Cyntia, Ferrer, Luciana

May-9-2023–arXiv.org Artificial Intelligence

Phone-level pronunciation scoring is a challenging task, with performance far from that of human annotators. Standard systems generate a score for each phone in a phrase using models trained for automatic speech recognition (ASR) with native data only. Better performance has been shown when using systems that are trained specifically for the task using non-native data. Yet, such systems face the challenge that datasets labelled for this task are scarce and usually small. In this paper, we present a transfer learning-based approach that leverages a model trained for ASR, adapting it for the task of pronunciation scoring. We analyze the effect of several design choices and compare the performance with a state-of-the-art goodness of pronunciation (GOP) system. Our final system is 20% better than the GOP system on EpaDB, a database for pronunciation scoring research, for a cost function that prioritizes low rates of unnecessary corrections.

artificial intelligence, machine learning, pronunciation, (18 more...)

arXiv.org Artificial Intelligence

May-9-2023

arXiv.org PDF

Add feedback

Country:
- Asia (0.04)
- South America > Argentina
  - Pampas > Buenos Aires F.D. > Buenos Aires (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (0.88)
  - Machine Learning
    - Neural Networks (0.71)
    - Transfer Learning (0.62)
    - Performance Analysis > Accuracy (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found