Mitigating Reward Overoptimization via Lightweight Uncertainty Estimation
–Neural Information Processing Systems
Reinforcement Learning from Human Feedback (RLHF) has been pivotal in aligning Large Language Models with human values but often suffers from overopti-mization due to its reliance on a proxy reward model.
Neural Information Processing Systems
Nov-19-2025, 22:02:08 GMT
- Country:
- Asia
- Middle East > Jordan (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Asia
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Technology: