Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models

He, Tianxing, Liu, Jun, Cho, Kyunghyun, Ott, Myle, Liu, Bing, Glass, James, Peng, Fuchun

Oct-23-2019–arXiv.org Artificial Intelligence

In this work, we study how the large-scale pretrain-finetune framework changes the behavior of a neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. We find that after standard fine-tuning, the model forgets important language generation skills acquired during large-scale pre-training. We demonstrate the forgetting phenomenon through a detailed behavior analysis from the perspectives of context sensitivity and knowledge transfer. Adopting the concept of data mixing, we propose an intuitive fine-tuning strategy named "mix-review". We find that mix-review effectively regularize the fine-tuning process, and the forgetting problem is largely alleviated. Finally, we discuss interesting behavior of the resulting dialogue model and its implications.

deep learning, neural network, pre-trained model, (20 more...)

arXiv.org Artificial Intelligence

Oct-23-2019

arXiv.org PDF

Add feedback

Country:
- Asia (1.00)
- North America > United States
  - California (0.28)
  - Massachusetts > Middlesex County
    - Cambridge (0.14)

Genre:
- Personal > Interview (0.46)
- Research Report (1.00)

Industry:
- Leisure & Entertainment (1.00)
- Media (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)
  - Natural Language > Generation (0.61)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found