LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits

Jun-13-2026, 18:24:18 GMT–Neural Information Processing Systems

Reward Models (RMs) are crucial to aligning large language models (LLMs), but the degree to which an RM specialized to one task (e.g.

artificial intelligence, large language model, natural language, (9 more...)

Neural Information Processing Systems

Jun-13-2026, 18:24:18 GMT

Conferences Web Page

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)