Rule Based Rewards for Language Model Safety

Neural Information Processing Systems 

We propose a novel preference modeling approach that utilizes AI feedback and only requires a small amount of human data.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found