Dataless Knowledge Fusion by Merging Weights of Language Models
Jin, Xisen, Ren, Xiang, Preotiuc-Pietro, Daniel, Cheng, Pengxiang
–arXiv.org Artificial Intelligence
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. Oftentimes fine-tuned models are readily available but their training data is not, due to data privacy or intellectual property concerns. This creates a barrier to fusing knowledge across individual models to yield a better single model. In this paper, we study the problem of merging individual models built on different training data sets to obtain a single model that performs well both across all data set domains and can generalize on out-ofdomain data. We propose a dataless knowledge fusion method that merges models in their parameter space, guided by weights that minimize prediction differences between the merged model and the individual models. Over a battery of evaluation settings, we show that the proposed method significantly outperforms baselines such as Fisher-weighted averaging or model ensembling. Further, we find that our method is a promising alternative to multi-task learning that can preserve or sometimes improve over the individual models without access to the training data. Finally, model merging is more efficient than training a multi-task model, thus making it applicable to a wider set of scenarios. The dominant paradigm for solving NLP tasks ranging from classification to sequence tagging involves fine-tuning a pretrained language model (PLM) using task-specific labeled data (Devlin et al., 2019; He et al., 2021). This results in specialized models that are explicitly trained to run inference over a single domain and task.
arXiv.org Artificial Intelligence
Oct-12-2023
- Country:
- South America > Chile
- North America
- Dominican Republic (0.04)
- United States
- California (0.14)
- Washington > King County
- Seattle (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Europe > Ireland
- Leinster > County Dublin > Dublin (0.04)
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (0.68)
- Technology: