Language Matters: How Do Multilingual Input and Reasoning Paths Affect Large Reasoning Models?
Tam, Zhi Rui, Wu, Cheng-Kuang, Chiu, Yu Ying, Lin, Chieh-Yen, Chen, Yun-Nung, Lee, Hung-yi
–arXiv.org Artificial Intelligence
Large reasoning models (LRMs) have demonstrated impressive performance across a range of reasoning tasks, yet little is known about their internal reasoning processes in multilingual settings. We begin with a critical question: {\it In which language do these models reason when solving problems presented in different languages?} Our findings reveal that, despite multilingual training, LRMs tend to default to reasoning in high-resource languages (e.g., English) at test time, regardless of the input language. When constrained to reason in the same language as the input, model performance declines, especially for low-resource languages. In contrast, reasoning in high-resource languages generally preserves performance. We conduct extensive evaluations across reasoning-intensive tasks (MMMLU, MATH-500) and non-reasoning benchmarks (CulturalBench, LMSYS-toxic), showing that the effect of language choice varies by task type: input-language reasoning degrades performance on reasoning tasks but benefits cultural tasks, while safety evaluations exhibit language-specific behavior. By exposing these linguistic biases in LRMs, our work highlights a critical step toward developing more equitable models that serve users across diverse linguistic backgrounds.
arXiv.org Artificial Intelligence
May-26-2025
- Country:
- Africa
- Nigeria (0.04)
- West Africa (0.04)
- Zimbabwe (0.04)
- Asia
- Malaysia (0.04)
- Middle East > Republic of Türkiye (0.04)
- Japan (0.04)
- Indonesia (0.04)
- Philippines (0.04)
- Russia (0.04)
- China (0.04)
- Taiwan (0.04)
- Singapore (0.04)
- Europe
- North America
- Oceania > Australia (0.04)
- South America
- Africa
- Genre:
- Research Report > New Finding (0.48)
- Technology: