AITopics | metamath

Collaborating Authors

metamath

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

a8901c5e85fb8e1823bbf0f755053672-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 05:42:06 GMT

The nineteenth century saw the emergence of proofs so involved that they could only be verified by a handful of specialists.

artificial intelligence, logic & formal reasoning, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > North Carolina > Wake County > Morrisville (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.68)

Add feedback

A Proving environments

Neural Information Processing SystemsAug-17-2025, 11:39:14 GMT

D.5 Metamath Lean versions To compare our models in the same setup while working on this project, we ran all our experiments

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

a8901c5e85fb8e1823bbf0f755053672-Paper-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 11:39:11 GMT

logic & formal reasoning, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > North Carolina > Wake County > Morrisville (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

LEMMA: Learning from Errors for MatheMatical Advancement in LLMs

Pan, Zhuoshi, Li, Yu, Lin, Honglin, Pei, Qizhi, Tang, Zinan, Wu, Wei, Ming, Chenlin, Zhao, H. Vicky, He, Conghui, Wu, Lijun

arXiv.org Artificial IntelligenceMar-21-2025

Large language models (LLMs) have demonstrated remarkable reasoning capability in solving mathematical problems. However, existing approaches primarily focus on improving the quality of correct training data, e.g., distilling high-quality correct solutions from advanced models, neglecting the value contained in error data, potentially hindering the model's reflective ability. Though some studies attempt to leverage error data, they often involve complex mechanisms, such as Monte Carlo Tree Search (MCTS) to explore error nodes. In this work, we propose to enhance LLMs' reasoning ability by Learning from Errors for Mathematical Advancement (LEMMA). LEMMA constructs data consisting of an incorrect solution with an erroneous step and a reflection connection to a correct solution for fine-tuning. Specifically, we systematically analyze the model-generated error types and introduce an error-type grounded mistake augmentation method to collect diverse and representative errors. Correct solutions are either from fixing the errors or generating a fresh start. Through a model-aware smooth reflection connection, the erroneous solution is transferred to the correct one. By fine-tuning on the constructed dataset, the model is able to self-correct errors autonomously within the generation process without relying on external critique models. Experimental results demonstrate that LEMMA achieves significant performance improvements over other strong baselines.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.17439

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy

Ruan, Zhiwen, Li, Yixia, Zhu, He, Wang, Longyue, Luo, Weihua, Zhang, Kaifu, Chen, Yun, Chen, Guanhua

arXiv.org Artificial IntelligenceFeb-16-2025

Despite being pretrained on multilingual corpora, large language models (LLMs) exhibit suboptimal performance on low-resource languages. Recent approaches have leveraged multilingual encoders alongside LLMs by introducing trainable parameters connecting the two models. However, these methods typically focus on the encoder's output, overlooking valuable information from other layers. We propose \aname (\mname), a framework that integrates representations from all encoder layers, coupled with the \attaname mechanism to enable layer-wise interaction between the LLM and the multilingual encoder. Extensive experiments on multilingual reasoning tasks, along with analyses of learned representations, show that our approach consistently outperforms existing baselines.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.11405

Country:

North America (0.46)
Europe (0.46)
Asia > Thailand (0.14)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

System-2 Mathematical Reasoning via Enriched Instruction Tuning

Cai, Huanqia, Yang, Yijun, Li, Zhifeng

arXiv.org Artificial IntelligenceDec-24-2024

Solving complex mathematical problems via system-2 reasoning is a natural human skill, yet it remains a significant challenge for current large language models (LLMs). We identify the scarcity of deliberate multi-step reasoning data as a primary limiting factor. To this end, we introduce Enriched Instruction Tuning (EIT), a method that enriches existing human-annotated mathematical datasets by synergizing human and AI feedback to create fine-grained reasoning trajectories. These datasets are then used to fine-tune open-source LLMs, enhancing their mathematical reasoning abilities without reliance on any symbolic verification program. Concretely, EIT is composed of two critical steps: Enriching with Reasoning Plan (ERP) and Enriching with Reasoning Step (ERS). The former generates a high-level plan that breaks down complex instructions into a sequence of simpler objectives, while ERS fills in reasoning contexts often overlooked by human annotators, creating a smoother reasoning trajectory for LLM fine-tuning. Unlike existing CoT prompting methods that generate reasoning chains only depending on LLM's internal knowledge, our method leverages human-annotated initial answers as ``meta-knowledge'' to help LLMs generate more detailed and precise reasoning processes, leading to a more trustworthy LLM expert for complex mathematical problems. In experiments, EIT achieves an accuracy of 84.1% on GSM8K and 32.5% on MATH, surpassing state-of-the-art fine-tuning and prompting methods, and even matching the performance of tool-augmented methods.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.16964

Genre: Research Report > New Finding (0.68)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

Pal, Arka, Karkhanis, Deep, Dooley, Samuel, Roberts, Manley, Naidu, Siddartha, White, Colin

arXiv.org Artificial IntelligenceJul-3-2024

Direct Preference Optimisation (DPO) is effective at significantly improving the performance of large language models (LLMs) on downstream tasks such as reasoning, summarisation, and alignment. Using pairs of preferred and dispreferred data, DPO models the relative probability of picking one response over another. In this work, first we show theoretically that the standard DPO loss can lead to a reduction of the model's likelihood of the preferred examples, as long as the relative probability between the preferred and dispreferred classes increases. We then show empirically that this phenomenon occurs when fine-tuning LLMs on common datasets, especially datasets in which the edit distance between pairs of completions is low. Using these insights, we design DPO-Positive (DPOP), a new loss function and training procedure which avoids this failure mode. Surprisingly, we find that DPOP outperforms DPO and other fine-tuning procedures across a wide variety of datasets and downstream tasks, including datasets with high edit distances between completions. Furthermore, we find that the DPOP-tuned model outperforms the DPO-tuned model (all else equal) on benchmarks independent of the fine-tuning data, such as MT-Bench. Finally, using DPOP, we create and open-source Smaug-34B and Smaug-72B, with the latter becoming the first open-source LLM to surpass an average accuracy of 80% on the HuggingFace Open LLM Leaderboard.

completion, dataset, probability, (11 more...)

arXiv.org Artificial Intelligence

2402.13228

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report (0.50)

Industry: Automobiles & Trucks > Manufacturer (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation

Ahn, Jihyun Janice, Kamoi, Ryo, Cheng, Lu, Zhang, Rui, Yin, Wenpeng

arXiv.org Artificial IntelligenceJun-26-2024

Mainstream LLM research has primarily focused on enhancing their generative capabilities. However, even the most advanced LLMs experience uncertainty in their outputs, often producing varied results on different runs or when faced with minor changes in input, despite no substantial change in content. Given multiple responses from the same LLM to the same input, we advocate leveraging the LLMs' discriminative capability to reduce this generative uncertainty, aiding in identifying the correct answers. Specifically, we propose and analyze three discriminative prompts: direct, inverse, and hybrid, to explore the potential of both closed-source and open-source LLMs in self-improving their generative performance on two benchmark datasets. Our insights reveal which discriminative prompt is most promising and when to use it. To our knowledge, this is the first work to systematically analyze LLMs' discriminative capacity to address generative uncertainty.

direct prompt, inverse prompt, llm, (15 more...)

arXiv.org Artificial Intelligence

2407.11017

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.78)

Add feedback

ATG: Benchmarking Automated Theorem Generation for Generative Language Models

Lin, Xiaohan, Cao, Qingxing, Huang, Yinya, Yang, Zhicheng, Liu, Zhengying, Li, Zhenguo, Liang, Xiaodan

arXiv.org Artificial IntelligenceMay-4-2024

Humans can develop new theorems to explore broader and more complex mathematical results. While current generative language models (LMs) have achieved significant improvement in automatically proving theorems, their ability to generate new or reusable theorems is still under-explored. Without the new theorems, current LMs struggle to prove harder theorems that are distant from the given hypotheses with the exponentially growing search space. Therefore, this paper proposes an Automated Theorem Generation (ATG) benchmark that evaluates whether an agent can automatically generate valuable (and possibly brand new) theorems that are applicable for downstream theorem proving as reusable knowledge. Specifically, we construct the ATG benchmark by Figure 1: An example theorem generated by GPT-4 splitting the Metamath library into three sets: (OpenAI, 2023). GPT-4 wrongly refers to the intermediate axioms, library, and problem based on their theorem (A (B A) A A) as proving depth. We conduct extensive experiments ((A B) (A A)). In Step 4, it applies "ax-to investigate whether current LMs can 1" but obtains the wrong expression instead of correct generate theorems in the library and benefit (A (B A)) and can not derive (A A) even the problem theorems proving. The results with the incorrect Steps 4 and 5. demonstrate that high-quality ATG data facilitates models' performances on downstream

dataset, hypothesis, theorem, (15 more...)

arXiv.org Artificial Intelligence

2405.06677

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > China > Guangdong Province > Shenzhen (0.04)
(8 more...)

Genre:

Workflow (0.55)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

Ying, Huaiyuan, Zhang, Shuo, Li, Linyang, Zhou, Zhejian, Shao, Yunfan, Fei, Zhaoye, Ma, Yichuan, Hong, Jiawei, Liu, Kuikun, Wang, Ziyi, Wang, Yudong, Wu, Zijian, Li, Shuaibin, Zhou, Fengzhe, Liu, Hongwei, Zhang, Songyang, Zhang, Wenwei, Yan, Hang, Qiu, Xipeng, Wang, Jiayu, Chen, Kai, Lin, Dahua

arXiv.org Artificial IntelligenceFeb-9-2024

The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatile math reasoner, verifier, prover, and augmenter. These abilities can be used to develop the next math LLMs or self-iteration. InternLM-Math obtains open-sourced state-of-the-art performance under the setting of in-context learning, supervised fine-tuning, and code-assisted reasoning in various informal and formal benchmarks including GSM8K, MATH, Hungary math exam, MathBench-ZH, and MiniF2F. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning. We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math. Our models, codes, and data are released at \url{https://github.com/InternLM/InternLM-Math}.

dataset, language model, reasoning, (13 more...)

arXiv.org Artificial Intelligence

2402.06332

Country:

Europe > Hungary (0.24)
North America > United States > California (0.14)
Asia > China > Shanghai > Shanghai (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback