Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning

Medin, Lucas Block, Pellegrini, Thomas, Gelin, Lucile

Mar-6-2025–arXiv.org Artificial Intelligence

Child speech recognition is still an underdeveloped area of research due to the lack of data (especially on non-English languages) and the specific difficulties of this task. Having explored various architectures for child speech recognition in previous work, in this article we tackle recent self-supervised models. We first compare wav2vec 2.0, HuBERT and WavLM models adapted to phoneme recognition in French child speech, and continue our experiments with the best of them, WavLM base+. We then further adapt it by unfreezing its transformer blocks during fine-tuning on child speech, which greatly improves its performance and makes it significantly outperform our base model, a Transformer+CTC. Finally, we study in detail the behaviour of these two models under the real conditions of our application, and show that WavLM base+ is more robust to various reading tasks and noise levels. Index Terms: speech recognition, child speech, self-supervised learning

artificial intelligence, machine learning, speech, (15 more...)

arXiv.org Artificial Intelligence

Mar-6-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Asia > Japan (0.04)
- Europe
  - France
    - Île-de-France > Paris
      - Paris (0.04)
    - Occitanie > Haute-Garonne
      - Toulouse (0.05)
  - Austria > Styria
    - Graz (0.04)

Genre:
- Research Report (1.00)

Industry:
- Education (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found