Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning

Xu, Runxin, Luo, Fuli, Zhang, Zhiyuan, Tan, Chuanqi, Chang, Baobao, Huang, Songfang, Huang, Fei

Sep-12-2021–arXiv.org Artificial Intelligence

Recent pretrained language models extend from millions to billions of parameters. Thus the need to fine-tune an extremely large pretrained model with a limited training corpus arises in various downstream tasks. In this paper, we propose a straightforward yet effective fine-tuning technique, Child-Tuning, which updates a subset of parameters (called child network) of large pretrained models via strategically masking out the gradients of the non-child network during the backward process. Experiments on various downstream tasks in GLUE benchmark show that Child-Tuning consistently outperforms the vanilla fine-tuning by 1.5~8.6 average score among four different pretrained models, and surpasses the prior fine-tuning techniques by 0.6~1.3 points. Furthermore, empirical results on domain transfer and task transfer show that Child-Tuning can obtain better generalization performance by large margins.

child network, fine-tuning, vanilla fine-tuning, (15 more...)

arXiv.org Artificial Intelligence

Sep-12-2021

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.04)
- North America > United States
  - Indiana > Hamilton County > Fishers (0.04)
- Europe > Romania
  - Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.65)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)