Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms Rafael Rafailov Stanford University
–Neural Information Processing Systems
A prominent issue with such methods is reward over-optimization or reward hacking, where performance as measured by the learned proxy reward model increases, but true quality plateaus or even deteriorates.
Neural Information Processing Systems
Oct-10-2025, 19:39:33 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- California > Santa Clara County
- Palo Alto (0.04)
- Massachusetts > Hampshire County
- Amherst (0.04)
- California > Santa Clara County
- Asia > Middle East
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Industry:
- Information Technology (0.46)
- Technology: