Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
–Neural Information Processing Systems
In this paper, we propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO), designed to address this uncertainty in preference strength.
Neural Information Processing Systems
Feb-17-2026, 23:21:17 GMT
- Country:
- North America > United States > Texas > Brazos County > College Station (0.04)
- Genre:
- Research Report > New Finding (0.46)
- Technology: