Learning to Summarize with Human Feedback
Note that our human feedback models generate summaries that are significantly shorter than summaries from models trained on CNN/DM. At a given summary length, our 6.7B human feedback model trained on Reddit performs almost as well as a fine-tuned 11B T5 model, despite not being re-trained on CNN/DM. To test our models' generalization, we also applied them directly to the popular CNN/DM news dataset. These articles are more than twice as long as Reddit posts and are written in a very different style. Our models have seen news articles during pre-training, but all of our human data and RL fine-tuning was on the Reddit TL;DR dataset.
Sep-7-2020, 06:20:28 GMT
- Country:
- North America > United States > California > Santa Clara County > San Jose (0.04)
- Technology: