Mathematical Capabilities of ChatGPT Simon Frieder,1, Alexis Chevalier 3, Ryan-Rhys Griffiths
–Neural Information Processing Systems
We investigate the mathematical capabilities of two versions of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel evaluation scheme. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., mathlib, the Lean Mathematical Library), current datasets of natural-language mathematics used to benchmark language models either cover only elementary mathematics or are very small. We address this by publicly releasing two new datasets: GHOSTS and miniGHOSTS. These are the first natural-language datasets curated by working researchers in mathematics that (1) aim to cover graduate-level mathematics, (2) provide a holistic overview of the mathematical capabilities of language models, and (3) distinguish multiple dimensions of mathematical reasoning. These datasets test, by using 1636 human expert evaluations, whether ChatGPT and GPT-4 can be helpful assistants to professional mathematicians by emulating use cases that arise in the daily professional activities of mathematicians.
Neural Information Processing Systems
Mar-23-2025, 17:23:59 GMT
- Country:
- Europe
- Austria (0.28)
- United Kingdom > England (0.27)
- Europe
- Genre:
- Instructional Material (0.67)
- Research Report (1.00)
- Industry:
- Education > Educational Setting (0.92)
- Government (0.67)
- Information Technology (0.92)
- Law (1.00)
- Technology: