AITopics

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

Neural Information Processing SystemsMar-27-2025, 12:34:45 GMT

Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process. In the classical RLHF framework, a reward model is first trained to represent human preferences, which is in turn used by an online reinforcement learning (RL) algorithm to optimize the LLM. A prominent issue with such methods is reward over-optimization or reward hacking, where performance as measured by the learned proxy reward model increases, but true quality plateaus or even deteriorates. Direct Alignment Algorithms (DAAs) like Direct Preference Optimization have emerged as alternatives to the classical RLHF pipeline by circumventing the reward modeling phase. However, although DAAs do not use a separate proxy reward model, they still commonly deteriorate from over-optimization. While the so-called reward hacking phenomenon is not well-defined for DAAs, we still uncover similar trends: at higher KL budgets, DAA algorithms exhibit similar degradation patterns to their classic RLHF counterparts. In particular, we find that DAA methods deteriorate not only across a wide range of KL budgets but also often before even a single epoch of the dataset is completed. Through extensive empirical experimentation, this work formulates and formalizes the reward over-optimization or hacking problem for DAAs and explores its consequences across objectives, training regimes, and model scales.

large language model, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

b14d7175755b180dc2163e15e3110cb6-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 12:34:32 GMT

artificial intelligence, hyperparameter, machine learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)

Add feedback

Parameter-free Dynamic Graph Embedding for Link Prediction

Neural Information Processing SystemsMar-27-2025, 12:34:28 GMT

Dynamic interaction graphs have been widely adopted to model the evolution of user-item interactions over time. There are two crucial factors when modelling user preferences for link prediction in dynamic interaction graphs: 1) collaborative relationship among users and 2) user personalized interaction patterns. Existing methods often implicitly consider these two factors together, which may lead to noisy user modelling when the two factors diverge. In addition, they usually require time-consuming parameter learning with back-propagation, which is prohibitive for real-time user preference modelling. To this end, this paper proposes FreeGEM, a parameter-free dynamic graph embedding method for link prediction. Firstly, to take advantage of the collaborative relationships, we propose an incremental graph embedding engine to obtain user/item embeddings, which is an Online-Monitor-Offline architecture consisting of an Online module to approximately embed users/items over time, a Monitor module to estimate the approximation error in real time and an Offline module to calibrate the user/item embeddings when the online approximation errors exceed a threshold. Meanwhile, we integrate attribute information into the model, which enables FreeGEM to better model users belonging to some under represented groups. Secondly, we design a personalized dynamic interaction pattern modeller, which combines dynamic time decay with attention mechanism to model user short-term interests. Experimental results on two link prediction tasks show that FreeGEM can outperform the state-of-the-art methods in accuracy while achieving over 36X improvement in efficiency.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Slight Corruption in Pre-training Data Makes Better Diffusion Models

Neural Information Processing SystemsMar-27-2025, 12:34:28 GMT

Diffusion models (DMs) have shown remarkable capabilities in generating realistic high-quality images, audios, and videos. They benefit significantly from extensive pre-training on large-scale datasets, including web-crawled data with paired data and conditions, such as image-text and image-class pairs. Despite rigorous filtering, these pre-training datasets often inevitably contain corrupted pairs where conditions do not accurately describe the data. This paper presents the first comprehensive study on the impact of such condition corruption in pre-training data of DMs. We synthetically corrupt ImageNet-1K and CC3M to pre-train and evaluate over 50 conditional DMs.

large language model, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback

Neural Information Processing SystemsMar-27-2025, 12:34:16 GMT

The notable success of large language models (LLMs) has sparked an upsurge in building language agents to complete various complex tasks.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Asia (1.00)
North America > Mexico > Mexico City (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.92)
Media (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

916cb4e1aeafaa0757953c9bacd17337-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 12:34:11 GMT

artificial intelligence, cvpr, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

916cb4e1aeafaa0757953c9bacd17337-Paper-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 12:34:05 GMT

machine learning, natural language, segmentation, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Improved Algorithms for Contextual Dynamic Pricing Solenne Gaucher

Neural Information Processing SystemsMar-27-2025, 12:33:56 GMT

In contextual dynamic pricing, a seller sequentially prices goods based on contextual information. Buyers will purchase products only if the prices are below their valuations. The goal of the seller is to design a pricing strategy that collects as much revenue as possible. We focus on two different valuation models. The first assumes that valuations linearly depend on the context and are further distorted by noise.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.14)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

A Human Evaluation Details

Neural Information Processing SystemsMar-27-2025, 12:33:38 GMT

A.1 Unlearning Toxicity Human Eval Details We conduct human evaluation on 100 random prompts from the test set of R For each prompt, we compare 6 pairs of models: Quark versus other baselines, as shown in Table 2. In total we have 1200 comparisons, and each comparison is rated by 3 raters. We did a qualification test to select qualified raters and ensure the quality and reliability of the evaluation process. Following the setting of [40], given a comparison of generations, the raters were asked for three questions: 1. toxicity: which one is less rude, disrespectful or unreasonable? A.2 Unlearning Sentiment Human Eval Details Similar to above, we randomly choose 100 positive prompts, and 100 negative prompts to conduct human evaluation.

artificial intelligence, lieutenant colonel, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.95)

Industry:

Government > Military (1.00)
Government > Regional Government > North America Government > United States Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Quark: Controllable Text Generation with Reinforced [ Un]learning Ximing Lu Sean Welleck Jack Hessel

Neural Information Processing SystemsMar-27-2025, 12:33:36 GMT

Large-scale language models often learn behaviors that are misaligned with user expectations. Generated text may contain offensive or toxic language, contain significant repetition, or be of a different sentiment than desired by the user. We consider the task of unlearning these misalignments by fine-tuning the language model on signals of what not to do. We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model. Quark alternates between (i) collecting samples with the current language model, (ii) sorting them into quantiles based on reward, with each quantile identified by a reward token prepended to the language model's input, and (iii) using a standard language modeling loss on samples from each quantile conditioned on its reward token, while remaining nearby the original language model via a KL-divergence penalty. By conditioning on a high-reward token at generation time, the model generates text that exhibits less of the unwanted property. For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods like PPO [66], while relying only on standard language modeling primitives.

computational linguistic, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
(2 more...)

Add feedback

Filters

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

b14d7175755b180dc2163e15e3110cb6-Supplemental-Conference.pdf

Parameter-free Dynamic Graph Embedding for Link Prediction

Slight Corruption in Pre-training Data Makes Better Diffusion Models

A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback

916cb4e1aeafaa0757953c9bacd17337-Supplemental-Conference.pdf

916cb4e1aeafaa0757953c9bacd17337-Paper-Conference.pdf

Improved Algorithms for Contextual Dynamic Pricing Solenne Gaucher

A Human Evaluation Details

Quark: Controllable Text Generation with Reinforced [ Un]learning Ximing Lu Sean Welleck Jack Hessel