AITopics | mathador-lm

Collaborating Authors

mathador-lm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models

Kurtic, Eldar, Moeini, Amir, Alistarh, Dan

arXiv.org Artificial IntelligenceJun-19-2024

The ability of large language models (LLMs) to approach non-trivial tasks involving both information retrieval and mathematical reasoning has led to significant research interest in evaluating these properties. Yet, the popularity of reasoning benchmarks, such as the often-used Grade-School Math (GSM) [1] or MATH [2] datasets, is leading to performance saturation (see Figure 1), and can potentially lead to training set contamination. Thus, there is a stringent need to develop new strong benchmarks to evaluate LLM reasoning. We address this by proposing Mathador-LM, a new benchmark for examining the mathematical reasoning properties of LLMs. At a high level, Mathador-LM follows the popular Mathador mathematical game [3], in which a human player is given five base numbers together with a target number, and has to provide a series of calculations, each using one of the four basic arithmetic operations, which result in the target number.

arxiv preprint arxiv, benchmark, mathador-lm, (11 more...)

arXiv.org Artificial Intelligence

2406.12572

Country: Europe > France (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback