AITopics | deepmath

Collaborating Authors

deepmath

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Escaping the Verifier: Learning to Reason via Demonstrations

Cai, Locke, Provilkov, Ivan

arXiv.org Artificial IntelligenceDec-10-2025

Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, despite offering abundant expert demonstrations that remain under-utilized for reasoning-focused training. We introduce RARO (Relativistic Adversarial Reasoning Optimization) that learns strong reasoning capabilities from only expert demonstrations via Inverse Reinforcement Learning. Our method sets up an adversarial game between a policy and a relativistic critic: the policy learns to mimic expert answers, while the critic aims to identify the experts among (expert, policy) answer pairs. Both the policy and the critic are trained jointly and continuously via RL, and we identify the key stabilization techniques required for robust learning. Empirically, RARO significantly outperforms strong verifier-free baselines on all of our evaluation tasks -- Countdown, DeepMath, and Poetry Writing -- and enjoys the same robust scaling trends as RL with verifiers. These results demonstrate that our method effectively elicits strong reasoning performance from expert demonstrations alone, enabling robust reasoning learning even when task-specific verifiers are unavailable. Recent advances in Large Language Models (LLMs) have been driven substantially by improvements in their reasoning abilities. Reasoning enables LLMs to perform deliberate intermediate computations before producing answers to the user queries, proposing candidate solutions and self-corrections. Much of this progress has been enabled via Reinforcement Learning (RL) on verifiable tasks such as mathematics and competitive programming (DeepSeek-AI et al., 2025; Y ang et al., 2025a; Shao et al., 2024; Luo et al., 2025). Notably, recent work has demonstrated that RL with V erifiable Rewards (RL VR) can enable LLMs to develop robust reasoning capabilities without any additional supervision (DeepSeek-AI et al., 2025). A growing body of work further improves the efficiency and stability of such RL algorithms on verifiable tasks, such as DAPO (Y u et al., 2025) and GSPO (Zheng et al., 2025). However, comparatively little attention has been paid to developing reasoning abilities on non-verifiable tasks, where task-specific verifiers are unavailable. Y et, in many impactful and challenging tasks -- such as analytical writing, open-ended research, or financial analysis -- LLM outputs are not directly verifiable due to hard-to-specify criteria, wide variation among acceptable answers, and other practical constraints. A popular approach in these settings is Reinforcement Learning from Human Feedback (RLHF) (Ouyang et al., 2022; Rafailov et al., 2023), but they require collecting human preferences beyond demonstration data, which is often a time-consuming and expensive process.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2511.21667

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DeepMath - Deep Sequence Models for Premise Selection

Neural Information Processing SystemsNov-21-2025, 02:23:00 GMT

We study the effectiveness of neural sequence models for premise selection in automated theorem proving, a key bottleneck for progress in formalized mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing state-of-the-art models. To our knowledge, this is the first time deep learning has been applied theorem proving on a large scale.

deep sequence model, name change, premise selection, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

DeepMath - Deep Sequence Models for Premise Selection

Neural Information Processing SystemsFeb-12-2025, 02:32:23 GMT

deep sequence model, deepmath, premise selection

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Reviews: DeepMath - Deep Sequence Models for Premise Selection

Neural Information Processing SystemsJan-20-2025, 22:27:44 GMT

Pros: * The problem is interesting and does not appear to have previously received attention from the NIPS community * The paper is clearly written Cons: * The approach is not particularly novel; it is mostly a case of applying a range of existing neural network components to the premise selection problem. Is there a way to meaningfully compare these numbers? If not, it's confusing to include them. This presentation is asking us to take a maximum of test performance over all the different neural network variants, and to compare that to the performance of a single baseline method. However, there is quite a lot of variance in the neural network performance relative to the difference between neural network and baseline performance.

deep sequence model, deepmath, premise selection, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DeepMath - Deep Sequence Models for Premise Selection

Irving, Geoffrey, Szegedy, Christian, Alemi, Alexander A., Een, Niklas, Chollet, Francois, Urban, Josef

Neural Information Processing SystemsFeb-14-2020, 10:58:14 GMT

deep sequence model, deepmath, premise selection

Neural Information Processing Systems

Genre: Research Report (0.74)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback