Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Sriyash Poddar, Y anming Wan

Neural Information Processing Systems 

While conceptually simple, we show that in practice, this reward modeling requires careful algorithmic considerations around model architecture and reward scaling.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found