Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Neural Information Processing Systems 

Additionally, we show that LM behaviors can be customized using different combinations of fine-grained reward models.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found