Distributional Preference Alignment of LLMs via Optimal Transport Igor Melnyk

Neural Information Processing Systems 

Ouyang et al., 2022, Bai et al., 2022], achieves this by learning a reward model on human preference

Similar Docs  Excel Report  more

TitleSimilaritySource
None found