Reasoning Transfer for an Extremely Low-Resource and Endangered Language: Bridging Languages Through Sample-Efficient Language Understanding
Tran, Khanh-Tung, O'Sullivan, Barry, Nguyen, Hoang D.
–arXiv.org Artificial Intelligence
Recent advances have enabled Large Language Models (LLMs) to tackle reasoning tasks by generating chain-of-thought (CoT) rationales, yet these gains have largely applied to high-resource languages, leaving low-resource languages behind. In this work, we first investigate CoT techniques in extremely low-resource scenarios through previous prompting, model-editing, and fine-tuning approaches. We introduce English-Pivoted CoT Training, leveraging the insight that LLMs internally operate in a latent space aligned toward the dominant language. Given input in a low-resource language, we perform supervised fine-tuning to generate CoT in English and output the final response in the target language. Across mathematical reasoning benchmarks, our approach outperforms other baselines with up to 28.33% improvement in low-resource scenarios. Our analysis and additional experiments, including Mixed-Language CoT and Two-Stage Training, show that explicitly separating language understanding from reasoning enhances cross-lingual reasoning abilities. To facilitate future work, we also release \emph{LC2024}, the first benchmark for mathematical tasks in Irish, an extremely low-resource and endangered language. Our results and resources highlight a practical pathway to multilingual reasoning without extensive retraining in every extremely low-resource language, despite data scarcity.
arXiv.org Artificial Intelligence
Nov-27-2025
- Country:
- Asia
- Europe > Ireland
- Munster > County Cork > Cork (0.04)
- North America
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Florida > Miami-Dade County
- Mexico > Mexico City
- Genre:
- Research Report > New Finding (0.48)
- Technology: