Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

Neural Information Processing Systems 

Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning foundation models to human values and preferences.