Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts
Sun, Wenju, Li, Qingyong, Wang, Wen, Geng, Yangli-ao, Li, Boyang
–arXiv.org Artificial Intelligence
Multi-task model merging offers an efficient solution for integrating knowledge from multiple fine-tuned models, mitigating the significant computational and storage demands associated with multi-task training. Despite the promising performance of TA, conflicts can arise among the task vectors, particularly when different tasks require distinct model adaptations. In this paper, we formally define this issue as knowledge conflicts, characterized by the performance degradation of one task after merging with a model fine-tuned for another task. Restricting parameter merging within this trust region, TATR can effectively alleviate knowledge conflicts. Moreover, TATR serves as both an independent approach and a plug-and-play module compatible with a wide range of TAbased methods. Extensive empirical evaluations on eight distinct datasets robustly demonstrate that TATR improves the multi-task performance of several TA-based model merging methods by an observable margin. The growing adoption of large foundation models is accompanied by significant practical challenges in terms of computational and storage demands (Kaplan et al., 2020). To address these challenges, multi-task model merging (Matena & Raffel, 2022) has emerged as a promising solution. Here task vectors are the difference in model parameters between the pre-trained foundation model and its fine-tuned version on a specific task. This approach builds a high-performance multi-task model by simple arithmetic operations in the model parameter space, thereby reducing computational overheads associated with fine-tuning on multiple tasks. Despite their successes, task arithmetic and its variants (Yadav et al., 2023; Wang et al., 2024; Yang et al., 2024b;a) still suffer from conflicts between task vectors.
arXiv.org Artificial Intelligence
Jan-24-2025
- Genre:
- Research Report > New Finding (0.46)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning > Neural Networks (0.46)
- Natural Language (1.00)
- Representation & Reasoning
- Information Fusion (0.48)
- Optimization (0.46)
- Vision (0.93)
- Data Science (1.00)
- Artificial Intelligence
- Information Technology