Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios

Gállego, Gerard I., Pareras, Oriol, Garcia, Martí Cortada, Takanori, Lucas, Hernando, Javier

Sep-30-2025–arXiv.org Artificial Intelligence

We propose a Speech-to-Text Translation (S2TT) approach that integrates phoneme representations into a Chain-of-Thought (CoT) framework to improve translation in low-resource and zero-resource settings. By introducing phoneme recognition as an intermediate step, we enhance cross-lingual transfer, enabling translation even for languages with no labeled speech data. Our system builds on a multilingual LLM, which we extend to process speech and phonemes. Training follows a curriculum learning strategy that progressively introduces more complex tasks. Experiments on multilingual S2TT benchmarks show that phoneme-augmented CoT improves translation quality in low-resource conditions and enables zero-resource translation, while slightly impacting high-resource performance. Despite this trade-off, our findings demonstrate that phoneme-based CoT is a promising step toward making S2TT more accessible across diverse languages.

machine learning, natural language, translation, (19 more...)

arXiv.org Artificial Intelligence

Sep-30-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.68)
- Europe (0.68)
- Asia > Middle East
  - UAE (0.14)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language > Machine Translation (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found