Noise Contrastive Alignment of Language Models with Explicit Rewards Huayu Chen

Neural Information Processing Systems 

User intentions are typically formalized as evaluation rewards to be maximized when fine-tuning language models (LMs).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found