AITopics | vineppo

Collaborating Authors

vineppo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LILO: Learning to Reason at the Frontier of Learnability

Neural Information Processing SystemsJun-16-2026, 22:41:52 GMT

Reinforcement learning is a widely adopted component of large language model post-training, especially for reasoning-style tasks such as maths questions. However, as we show, most existing methods will provably fail to learn from questions that are too hard, where the model always fails, or too easy, where the model always succeeds. Much human effort is therefore spent producing datasets of questions of a suitable difficulty for state-of-the-art models. Given this, we consider how to algorithmically identify questions that allow for maximally efficient training. We introduce a method, LILO (Learnability Improves LLMs Optimally), that prioritises training on questions with high variance of success, known as learnability, and we provide theory which shows that LILO enables the expected improvement of the model to be large. We run a wide range of experiments over multiple base models, algorithms and reasoning datasets to demonstrate that LILO consistently reaches a higher final test accuracy, and can do so in 3 fewer training steps. We explore how questions with high learnability can be efficiently identified, and discuss how learnability can be scaled to produce LLM agents that autonomously and open-endedly expand the frontier of human knowledge.

large language model, learnability, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Learning to Reason at the Frontier of Learnability

Foster, Thomas, Foerster, Jakob

arXiv.org Artificial IntelligenceFeb-17-2025

Reinforcement learning is now widely adopted as the final stage of large language model training, especially for reasoning-style tasks such as maths problems. Typically, models attempt each question many times during a single training step and attempt to learn from their successes and failures. However, we demonstrate that throughout training with two popular algorithms (PPO and VinePPO) on two widely used datasets, many questions are either solved by all attempts - meaning they are already learned - or by none - providing no meaningful training signal. To address this, we adapt a method from the reinforcement learning literature - sampling for learnability - and apply it to the reinforcement learning stage of LLM training. Our curriculum prioritises questions with high variance of success, i.e. those where the agent sometimes succeeds, but not always. Our findings demonstrate that this curriculum consistently boosts training performance across multiple algorithms and datasets, paving the way for more efficient and effective reinforcement learning in LLMs.

large language model, learnability, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.12272

Genre: Research Report > New Finding (1.00)

Industry: Education > Curriculum (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Kazemnejad, Amirhossein, Aghajohari, Milad, Portelance, Eva, Sordoni, Alessandro, Reddy, Siva, Courville, Aaron, Roux, Nicolas Le

arXiv.org Artificial IntelligenceOct-2-2024

Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receiving any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, value networks face challenges in predicting the expected cumulative rewards accurately in complex reasoning tasks, often leading to high-variance updates and suboptimal performance. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they barely outperform a random baseline when comparing alternative steps. To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates, bypassing the need for large value networks. Our method consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets with fewer gradient updates (up to 9x), less wall-clock time (up to 3.0x). These results emphasize the importance of accurate credit assignment in RL finetuning of LLM and demonstrate VinePPO's potential as a superior alternative.

credit assignment, dataset, vineppo, (14 more...)

arXiv.org Artificial Intelligence

2410.01679

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > Quebec > Montreal (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(10 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Education (0.67)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback