LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits

Neural Information Processing Systems 

Reward Models (RMs) are crucial to aligning large language models (LLMs), but the degree to which an RM specialized to one task (e.g.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found