Parallel Scaling Law: Unveiling Reasoning Generalization through A Cross-Linguistic Perspective

Yang, Wen, Wu, Junhong, Li, Chong, Zong, Chengqing, Zhang, Jiajun

arXiv.org Artificial Intelligence 

Recent advancements in Reinforcement Post-Training (RPT) have significantly enhanced the capabilities of Large Reasoning Models (LRMs), sparking increased interest in the generalization of RL-based reasoning. While existing work has primarily focused on investigating its generalization across tasks or modalities, this study proposes a novel cross-linguistic perspective to investigate reasoning generalization. This raises a crucial question: Does the reasoning capability achieved from English RPT effectively transfer to other languages? We address this by systematically evaluating English-centric LRMs on multilingual reasoning benchmarks and introducing a metric to quantify cross-lingual transferability. Our findings reveal that cross-lingual transferability varies significantly across initial model, target language, and training paradigm. Through interventional studies, we find that models with stronger initial English capabilities tend to over-rely on English-specific patterns, leading to diminished cross-lingual generalization. To address this, we conduct a thorough parallel training study. Experimental results yield three key findings: First-Parallel Leap, a substantial leap in performance when transitioning from monolingual to just a single parallel language, and a predictable Parallel Scaling Law, revealing that cross-lingual reasoning transfer follows a power-law with the number of training parallel languages. Moreover, we identify the discrepancy between actual monolingual performance and the power-law prediction as Monolingual Generalization Gap, indicating that English-centric LRMs fail to fully generalize across languages. Our study challenges the assumption that LRM reasoning mirrors human cognition, providing critical insights for the development of more language-agnostic LRMs. Recent advancements in Reinforcement Post-Training (RPT) (Jaech et al., 2024; Kimi et al., 2025; Qwen, 2025) have emerged as a transformative paradigm for advancing the capabilities of Large Reasoning Models (LRMs). Techniques like Reinforcement Learning with V erifiable Rewards (RL VR) (Lambert et al., 2024; Guo et al., 2025) have even enabled models to surpass human-level performance on complex math reasoning benchmarks such as MA TH (Hendrycks et al., 2021) and AIME (Maxwell, 2024). Given these impressive gains in the mathematical domain, a central question has emerged: Can these RL-driven reasoning abilities generalize effectively?

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found