math problem
- North America > United States (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- Research Report (0.68)
- Workflow (0.46)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Iowa (0.04)
- North America > Canada (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
A New AI Math Startup Just Cracked 4 Previously Unsolved Problems
Axiom says its AI found solutions to several long-standing math problems, a sign of the technology's steadily advancing reasoning capabilities. Five years ago, mathematicians Dawei Chen and Quentin Gendron were trying to untangle a difficult area of algebraic geometry involving differentials, elements of calculus used to measure distance along curved surfaces . While working on one theorem, they ran into an unexpected roadblock: Their argument depended on a strange formula from number theory, but they were unable to solve or justify it. In the end, Chen and Gendron wrote a paper presenting their idea as a conjecture, rather than a theorem. Chen recently spent hours prompting ChatGPT in the hopes of getting the AI to come up with a solution to the still unsolved problem, but it wasn't working.
- Asia > Myanmar > Tanintharyi Region > Dawei (0.25)
- North America > United States > California (0.06)
- Asia > China (0.05)
- (5 more...)
- Information Technology (0.70)
- Government > Military (0.48)
- Oceania > New Zealand > North Island > Waikato (0.04)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (3 more...)
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Current AI alignment methodologies rely on human-provided demonstrations or judgments, and the learned capabilities of AI systems would be upper-bounded by human capabilities as a result. This raises a challenging research question: How can we keep improving the systems when their capabilities have surpassed the levels of humans?
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models
Mathematical reasoning is an important capability of large language models~(LLMs) for real-world applications.To enhance this capability, existing work either collects large-scale math-related texts for pre-training, or relies on stronger LLMs (\eg GPT-4) to synthesize massive math problems. Both types of work generally lead to large costs in training or synthesis.To reduce the cost, based on open-source available texts, we propose an efficient way that trains a small LLM for math problem synthesis, to efficiently generate sufficient high-quality pre-training data.To achieve it, we create a dataset using GPT-4 to distill its data synthesis capability into the small LLM.Concretely, we craft a set of prompts based on human education stages to guide GPT-4, to synthesize problems covering diverse math knowledge and difficulty levels.Besides, we adopt the gradient-based influence estimation method to select the most valuable math-related texts.The both are fed into GPT-4 for creating the knowledge distillation dataset to train the small LLM.We leverage it to synthesize 6 million math problems for pre-training our JiuZhang3.0
Investigating Bias: A Multilingual Pipeline for Generating, Solving, and Evaluating Math Problems with LLMs
Mahran, Mariam, Simbeck, Katharina
Large Language Models (LLMs) are increasingly used for educational support, yet their response quality varies depending on the language of interaction. This paper presents an automated multilingual pipeline for generating, solving, and evaluating math problems aligned with the German K-10 curriculum. We generated 628 math exercises and translated them into English, German, and Arabic. Three commercial LLMs (GPT-4o-mini, Gemini 2.5 Flash, and Qwen-plus) were prompted to produce step-by-step solutions in each language. A held-out panel of LLM judges, including Claude 3.5 Haiku, evaluated solution quality using a comparative framework. Results show a consistent gap, with English solutions consistently rated highest, and Arabic often ranked lower. These findings highlight persistent linguistic bias and the need for more equitable multilingual AI systems in education.
- South America > Uruguay > Maldonado > Maldonado (0.04)
- North America > United States > Florida > Hillsborough County > University (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (4 more...)
Eliciting Reasoning in Language Models with Cognitive Tools
Ebouky, Brown, Bartezzaghi, Andrea, Rigotti, Mattia
The recent advent of reasoning models like OpenAI's o1 was met with excited speculation by the AI community about the mechanisms underlying these capabilities in closed models, followed by a rush of replication efforts, particularly from the open source community. These speculations were largely settled by the demonstration from DeepSeek-R1 that chains-of-thought and reinforcement learning (RL) can effectively replicate reasoning on top of base LLMs. However, it remains valuable to explore alternative methods for theoretically eliciting reasoning that could help elucidate the underlying mechanisms, as well as providing additional methods that may offer complementary benefits. Here, we build on the long-standing literature in cognitive psychology and cognitive architectures, which postulates that reasoning arises from the orchestrated, sequential execution of a set of modular, predetermined cognitive operations. Crucially, we implement this key idea within a modern agentic tool-calling framework. In particular, we endow an LLM with a small set of "cognitive tools" encapsulating specific reasoning operations, each executed by the LLM itself. Surprisingly, this simple strategy results in considerable gains in performance on standard mathematical reasoning benchmarks compared to base LLMs, for both closed and open-weight models. For instance, providing our "cognitive tools" to GPT-4.1 increases its pass@1 performance on AIME2024 from 32% to 53%, even surpassing the performance of o1-preview. In addition to its practical implications, this demonstration contributes to the debate regarding the role of post-training methods in eliciting reasoning in LLMs versus the role of inherent capabilities acquired during pre-training, and whether post-training merely uncovers these latent abilities.
- Workflow (1.00)
- Research Report > New Finding (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Research Report (0.68)
- Instructional Material > Course Syllabus & Notes (0.50)
Simple Vision-Language Math Reasoning via Rendered Text
Skripkin, Matvey, Goncharova, Elizaveta, Kuznetsov, Andrey
We present a lightweight yet effective pipeline for training vision-language models to solve math problems by rendering LaTeX encoded equations into images and pairing them with structured chain-of-thought prompts. This simple text-to-vision augmentation enables compact multimodal architectures to achieve state-of-the-art reasoning accuracy. Through systematic ablations, we find that rendering fidelity and prompt design are the primary drivers of performance. Despite its simplicity, our approach consistently matches or surpasses both open-source and proprietary math-focused vision-language solvers on widely used benchmarks, while preserving broad general-domain competence - showing gains on tasks such as MMMU, ChartQA, and DocVQA of up to 20%.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)