Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Open in new window