LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits
–Neural Information Processing Systems
Reward Models (RMs) are crucial to aligning large language models (LLMs), but the degree to which an RM specialized to one task (e.g.
Neural Information Processing Systems
Jun-21-2026, 12:31:24 GMT
- Country:
- Europe (0.92)
- North America > United States
- Minnesota (0.27)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Technology: