Graph Connectionist Temporal Classification for Phoneme Recognition
–arXiv.org Artificial Intelligence
Automatic Phoneme Recognition (APR) systems are often trained using pseudo phoneme-level annotations generated from text through Grapheme-to-Phoneme (G2P) systems. These G2P systems frequently output multiple possible pronunciations per word, but the standard Connectionist Temporal Classification (CTC) loss cannot account for such ambiguity during training. In this work, we adapt Graph Temporal Classification (GTC) to the APR setting. GTC enables training from a graph of alternative phoneme sequences, allowing the model to consider multiple pronunciations per word as valid supervision. Our experiments on English and Dutch data sets show that incorporating multiple pronunciations per word into the training loss consistently improves phoneme error rates compared to a baseline trained with CTC. These results suggest that integrating pronunciation variation into the loss function is a promising strategy for training APR systems from noisy G2P-based supervision.
arXiv.org Artificial Intelligence
Sep-9-2025
- Country:
- Asia
- China > Shanghai
- Shanghai (0.04)
- India > Telangana
- Hyderabad (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Singapore (0.04)
- South Korea > Incheon
- Incheon (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- China > Shanghai
- Europe
- Belgium > Flanders
- Flemish Brabant > Leuven (0.04)
- Czechia > South Moravian Region
- Brno (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Greece (0.04)
- Hungary > Budapest
- Budapest (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Belgium > Flanders
- North America
- Canada
- United States
- California
- San Diego County > San Diego (0.04)
- San Francisco County > San Francisco (0.14)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Rhode Island (0.04)
- California
- Asia
- Genre:
- Research Report > New Finding (0.48)
- Technology: