Noise Contrastive Alignment of Language Models with Explicit Rewards Huayu Chen
–Neural Information Processing Systems
User intentions are typically formalized as evaluation rewards to be maximized when fine-tuning language models (LMs).
Neural Information Processing Systems
Oct-10-2025, 17:53:41 GMT