Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models
He, Tianxing, Liu, Jun, Cho, Kyunghyun, Ott, Myle, Liu, Bing, Glass, James, Peng, Fuchun
–arXiv.org Artificial Intelligence
In this work, we study how the large-scale pretrain-finetune framework changes the behavior of a neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. We find that after standard fine-tuning, the model forgets important language generation skills acquired during large-scale pre-training. We demonstrate the forgetting phenomenon through a detailed behavior analysis from the perspectives of context sensitivity and knowledge transfer. Adopting the concept of data mixing, we propose an intuitive fine-tuning strategy named "mix-review". We find that mix-review effectively regularize the fine-tuning process, and the forgetting problem is largely alleviated. Finally, we discuss interesting behavior of the resulting dialogue model and its implications.
arXiv.org Artificial Intelligence
Oct-23-2019
- Country:
- Asia (1.00)
- North America > United States
- California (0.28)
- Massachusetts > Middlesex County
- Cambridge (0.14)
- Genre:
- Personal > Interview (0.46)
- Research Report (1.00)
- Industry:
- Leisure & Entertainment (1.00)
- Media (0.93)
- Technology: