Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both

Open in new window