Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning
Castricato, Louis, Havrilla, Alexander, Matiana, Shahbuland, Pieler, Michael, Ye, Anbang, Yang, Ian, Frazier, Spencer, Riedl, Mark
–arXiv.org Artificial Intelligence
Controlled automated story generation seeks to generate natural language stories satisfying constraints from natural language critiques or preferences. Existing methods to control for story preference utilize prompt engineering which is labor intensive and often inconsistent. They may also use logit-manipulation methods which require annotated datasets to exist for the desired attributes. To address these issues, we first train a contrastive bi-encoder model to align stories with corresponding human critiques, named CARP, building a general purpose preference model. This is subsequently used as a reward function to fine-tune a generative language model via reinforcement learning. However, simply fine-tuning a generative language model with a contrastive reward Figure 1: Illustration of our technique for generating model does not always reliably result in story content controlled by preferences. A language a story generation system capable of generating model generates candidates, which are ranked stories that meet user preferences. To increase by the CARP model to produce scores. The scores are story generation robustness we further used to fine-tune the language model to produce higher fine-tune the contrastive reward model using a scoring--and thus more aligned with preferences-- prompt-learning technique.
arXiv.org Artificial Intelligence
Dec-15-2022
- Country:
- North America (0.28)
- Genre:
- Research Report (0.82)
- Industry:
- Leisure & Entertainment > Sports (0.46)
- Technology: