polymath
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Our benchmark ensures difficulty comprehensiveness, language diversity, and high-quality translation, making it a highly discriminative multilingual mathematical benchmark in the era of reasoning LLMs.We conduct a comprehensive evaluation for advanced LLMs and find that even Qwen-3-235B-A22B-Thinking and Gemini-2.5-pro,
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Wang, Yiming, Zhang, Pei, Tang, Jialong, Wei, Haoran, Yang, Baosong, Wang, Rui, Sun, Chenshu, Sun, Feitong, Zhang, Jiran, Wu, Junxuan, Cang, Qiqian, Zhang, Yichang, Huang, Fei, Lin, Junyang, Huang, Fei, Zhou, Jingren
In this paper, we introduce PolyMath, a multilingual mathematical reasoning benchmark covering 18 languages and 4 easy-to-hard difficulty levels. Our benchmark ensures difficulty comprehensiveness, language diversity, and high-quality translation, making it a highly discriminative multilingual mathematical benchmark in the era of reasoning LLMs. We conduct a comprehensive evaluation for advanced LLMs and find that even Qwen-3-235B-A22B-Thinking and Gemini-2.5-pro, achieve only 54.6 and 52.2 benchmark scores, with about 40% accuracy under the highest level From a language perspective, our benchmark reveals several key challenges of LLMs in multilingual reasoning: (1) Reasoning performance varies widely across languages for current LLMs; (2) Input-output language consistency is low in reasoning LLMs and may be correlated with performance; (3) The thinking length differs significantly by language for current LLMs. Additionally, we demonstrate that controlling the output language in the instructions has the potential to affect reasoning performance, especially for some low-resource languages, suggesting a promising direction for improving multilingual capabilities in LLMs.
Towards Machine Learning in Pharo: Visualizing Linear Regression
This is a small tutorial on how to estimate prices of houses in Pharo using linear regression model from PolyMath. We will then visualize the data points together with the regression line using the new charting capabilities of Roassal3. The main purpose of this blog post is to demonstrate the new charting functionality of Roassal3 that were introduced yesterday. The visualization that we will build is not very pretty, but it will give you a taste of the amazing things that we will be able to do in the near future. Pharo is a pure object-oriented programming language and a powerful environment, focused on simplicity and immediate feedback (think IDE and OS rolled into one).
Exploiting Problem Structure in Combinatorial Landscapes: A Case Study on Pure Mathematics Application
Xie, Xiao-Feng, Wang, Zun-Jing
In this paper, we present a method using AI techniques to solve a case of pure mathematics applications for finding narrow admissible tuples. The original problem is formulated into a combinatorial optimization problem. In particular, we show how to exploit the local search structure to formulate the problem landscape for dramatic reductions in search space and for non-trivial elimination in search barriers, and then to realize intelligent search strategies for effectively escaping from local minima. Experimental results demonstrate that the proposed method is able to efficiently find best known solutions. This research sheds light on exploiting the local problem structure for an efficient search in combinatorial landscapes as an application of AI to a new problem domain.
Let's Bring The Polymath -- and the Dabblers -- Back
I noticed recently that books with the phrase "The Last Man Who Knew Everything" all share in common that their subjects lived during the period close to the Scientific Revolution, roughly between 1550 to 1700. The Scientific Revolution killed our ability to Know Everything. It's as if the Scientific Revolution -- and the knowledge it spawned -- killed the ability to Know Everything. Before then, it was not only possible to be a generalist or polymath (someone with a wide range of expertise) -- but the weaving together of different disciplines was actually rather unexceptional. The Ancients discussed topics such as ethics, biology, and metaphysics alongside each other.