Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms Rafael Rafailov Stanford University

Oct-10-2025, 19:39:33 GMT–Neural Information Processing Systems

A prominent issue with such methods is reward over-optimization or reward hacking, where performance as measured by the learned proxy reward model increases, but true quality plateaus or even deteriorates.

dataset, probability mass, trajectory, (15 more...)

Neural Information Processing Systems

Oct-10-2025, 19:39:33 GMT

Conferences PDF

Country:
- North America > United States
  - Massachusetts > Hampshire County
    - Amherst (0.04)
  - California > Santa Clara County
    - Palo Alto (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.93)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.68)
  - Natural Language
    - Large Language Model (0.71)
    - Chatbot (0.68)
  - Machine Learning
    - Neural Networks > Deep Learning (0.96)
    - Reinforcement Learning (0.69)

Duplicate Docs Excel Report

Title
e45caa3d5273d105b8d045e748636957-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found