mathematics
AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?
Despite progress in language model (LM) capabilities, evaluations have thus far focused on models' performance on tasks that humans have previously solved, including in programming (SWE-Bench) and mathematics (FrontierMath). We therefore propose testing models' ability to design and implement algorithms in an open-ended benchmark: We task LMs with writing code that efficiently solves computationally challenging problems in computer science, physics, and mathematics. Our AlgoTune benchmark consists of 120 tasks collected from domain experts and a framework for validating and timing LM-synthesized solution code, which is compared to reference implementations from popular open-source packages.In addition, we develop a baseline LM agent, AlgoTuner, and evaluate its performance across a suite of frontier models.AlgoTuner achieves an average 1.58x speedup against reference solvers, including methods from packages such as SciPy, scikit-learn and CVXPY.However, we find that current models fail to discover algorithmic innovations, instead preferring surface-level optimizations. We hope that AlgoTune catalyzes the development of LM agents exhibiting creative problem solving beyond state-of-the-art human performance.
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics
Existing benchmarks for evaluating mathematical reasoning in large language models (LLMs) rely primarily on competition problems, formal proofs, or artificially challenging questions---failing to capture the nature of mathematics encountered in actual research environments. We introduce \textsc{RealMath}, a novel benchmark derived directly from research papers and mathematical forums that assesses LLMs' abilities on authentic mathematical tasks. Our approach addresses three critical challenges: sourcing diverse research-level content, enabling reliable automated evaluation through verifiable statements, and designing a continually refreshable dataset to mitigate contamination risks. Experimental results across multiple LLMs reveal surprising capabilities in handling research mathematics compared to competition problems, suggesting current models may already serve as valuable assistants for working mathematicians despite limitations on highly challenging problems.
An AI solution to an 80‑year‑old problem has shocked mathematicians
Last week, OpenAI shocked the mathematical community by revealing that one of its internal artificial intelligence (AI) models had found a counterexample to a famous conjecture made by legendary Hungarian mathematician Paul Erdős in 1946. The planar unit distance problem, or Erdős problem 90, has intrigued mathematicians for decades. The new result is no mere curiosity. Canadian mathematician Daniel Litt described it as "the first result produced autonomously by an AI that I find interesting in itself". The breakthrough, produced with a general-purpose AI model rather than one specialised for mathematics, also highlights how AI is changing mathematical research itself.
The maths meme that has been distracting mathematicians for a century
A seemingly simple set of rules kicks off a kind of mathematical magic trick, which has kept great minds busy since the 1930s. Almost a century ago, a mathematician came up with a puzzle that was so seemingly simple and yet so fiendishly difficult that it has been distracting other mathematicians ever since. It has become a meme that jumps from brain to brain, with many people claiming to have solved it, only to have their hopes dashed as the proof unravels. And be warned - once I explain the rules, you will immediately want to start playing around with it yourself, and I take no responsibility for how much of your time you waste. It starts a bit like a magic trick.
A golden age of maths is dawning and mathematicians are freaking out
I am attempting to solve a mathematical conundrum that has stumped many of humanity's greatest thinkers. I have zero mathematical training, apart from a distant undergraduate physics degree, which should put my odds of success at slim to none. But I also have a trick up my sleeve - a kind of mathematical genie that can conjure arcane secrets seemingly out of thin air. I make a short request concerning an esoteric conjecture in number theory, then cross my fingers. Perhaps "genie" is a bit too strong - I'm simply using GPT 5.5 Pro, the latest iteration of OpenAI's flagship model. But for mathematicians, modern AI models appear to have a spark of magic.
How human error became a weapon against large language models
Alan Turing proposed a test for machine intelligence: could a computer convince a human it was human? Recently, a friend told me over coffee about some disheartening feedback she had received. "They said it was good," she said, "but that it read like it was written by AI." Knowing her, I understood immediately what had happened. Her credibility was being questioned not because her work was poor, but because it was too good - too clear, too fluent, too polished. The rapid acceleration of artificial intelligence tools is changing how we think about good writing.
Start-ups are racing to revolutionise mathematics with AI
Mathematicians have never been so sought after by the world's richest people. At universities across the world, academics are seeing their colleagues mysteriously disappear and join private companies. Some of these companies are household names, like OpenAI and Google, but others are newly formed and just months old, hoping to capitalise on a moment in which mathematics is seen as the secret ingredient with which to improve artificial intelligence - which may in turn transform mathematics itself. "Last May, I was honestly kind of grieving for my scientific identity," says Ken Ono, who in 2025 went on leave from a professorship at the University of Virginia to join Axiom Math, a start-up aiming to build a maths-focused AI. Ono had been asked by a different company, called Epoch AI, to help craft a set of hard-to-solve maths problems that would test AI's problem-solving ability .
On the Subgaussianity of Quantized Linear Maps: An AI-Assisted Note
Zou, Guangyi, Vershynin, Roman
Simone Bombari asked us whether the 1-bit quantized random vector Y = sgn(Wx) has subgaussian norm bounded by a universal constant. Here W is an n n random Gaussian matrix, and x is an independent standard normal random vector in Rn. The question is nontrivial since the coordinates of Y are not independent. We give a strong positive answer to this question - for any bounded map instead of sgn() - using AI: AIDiscovery and Generalization (Theorem 1): To handle coordinate dependence, Gemini 3.5 Flash1 proposed decomposing the Gaussian vector into independent parts, using one part to "smooth" the sign function, and then applying Gaussian concentration for Lipschitz functions.
OpenAI makes breakthrough on 80-year-old maths problem
If you take a sheet of paper and add some dots, how many pairs can be the same distance apart? If you take a sheet of paper and add some dots, how many pairs can be the same distance apart? OpenAI has claimed a further advance in AI reasoning after its technology successfully tackled an 80-year-old maths problem. The company behind ChatGPT said it had made a breakthrough with a challenge first posed by Hungarian mathematician Paul Erdős in 1946: the planar unit distance problem. The question posed by Erdős is simple to explain.
Mathematicians stunned by AI's biggest breakthrough in mathematics yet
Mathematicians stunned by AI's biggest breakthrough in mathematics yet An 80-year-old maths conjecture that has eluded the world's greatest mathematicians has been cracked by an artificial intelligence model built by OpenAI. The result has stunned experts and is being hailed as a seismic moment for AI's mathematical ability. "This is a problem that I didn't expect to see solved in my lifetime," says Misha Rudnev at the University of Bristol, UK. "It's absolutely a bomb." Tim Gowers at the University of Cambridge wrote that the solution is "a milestone in AI mathematics" in a blog post accompanying the work . "If a human had written the paper and submitted it to the and I had been asked for a quick opinion, I would have recommended acceptance without any hesitation. No previous AI-generated proof has come close to that."