PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
–Neural Information Processing Systems
Our benchmark ensures difficulty comprehensiveness, language diversity, and high-quality translation, making it a highly discriminative multilingual mathematical benchmark in the era of reasoning LLMs.We conduct a comprehensive evaluation for advanced LLMs and find that even Qwen-3-235B-A22B-Thinking and Gemini-2.5-pro,
Neural Information Processing Systems
Jun-11-2026, 19:52:03 GMT
- Technology: