Comparison of End-to-end Speech Assessment Models for the NOCASA 2025 Challenge
Žavoronkov, Aleksei, Alumäe, Tanel
–arXiv.org Artificial Intelligence
ABSTRACT This paper presents an analysis of three end-to-end models developed for the NOCASA 2025 Challenge, aimed at automatic word-level pronunciation assessment for children learning Norwegian as a second language. Our models include an encoder-decoder Siamese architecture (E2E-R), a prefix-tuned direct classification model leveraging pretrained wav2vec2.0 We introduce a weighted ordinal cross-entropy loss tailored for optimizing metrics such as unweighted average recall and mean absolute error. Among the explored methods, our GOP-CTC-based model achieved the highest performance, substantially surpassing challenge baselines and attaining top leaderboard scores. Index T erms-- Speech assessment, GOP, NOCASA 1. INTRODUCTION The task of speech pronunciation assessment focuses on automatically evaluating a language learner's pronunciation of phonemes, words, or complete utterances. Such systems can be used to provide feedback in computer-aided language learning applications.
arXiv.org Artificial Intelligence
Sep-4-2025
- Country:
- Asia > Middle East
- Republic of Türkiye > Istanbul Province > Istanbul (0.40)
- Europe
- Estonia > Harju County
- Tallinn (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.40)
- Norway (0.04)
- Estonia > Harju County
- Asia > Middle East
- Genre:
- Research Report (0.82)
- Industry:
- Education (0.34)
- Technology: