JCAPT: A Joint Modeling Approach for CAPT

Yang, Tzu-Hsuan, He, Yue-Yang, Chen, Berlin

Jul-28-2025–arXiv.org Artificial Intelligence

Effective pronunciation feedback is critical in second language (L2) learning, for which computer-assisted pronunciation training (CAPT) systems often encompass two key tasks: automatic pronunciation assessment (AP A) and mispronunciation detection and diagnosis (MDD). Recent work has shown that joint modeling of these two tasks can yield mutual benefits. Our unified framework leverages Mamba, a selective state space model (SSM), while integrating phonological features and think token strategies to jointly enhance interpretability and fine-grained temporal reasoning in AP A and MDD. To our knowledge, this is the first study to combine phonological attribution, SSM-based modeling, and prompting in CAPT. A series of experiments conducted on the speechocean762 benchmark demonstrate that our model consistently outperforms prior methods, particularly on the MDD task.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jul-28-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Taiwan (0.14)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Representation & Reasoning (0.70)
  - Speech > Speech Recognition (0.48)
  - Machine Learning
    - Neural Networks (0.47)
    - Statistical Learning (0.46)
    - Inductive Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found