A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models

Tang, Qiaoyu, Yu, Le, Yu, Bowen, Lin, Hongyu, Lu, Keming, Lu, Yaojie, Han, Xianpei, Sun, Le

Oct-17-2024–arXiv.org Artificial Intelligence

Post-training has emerged as a crucial paradigm for adapting large-scale pretrained models to various tasks, whose effects are fully reflected by delta parameters (i.e., the disparity between post-trained and pre-trained parameters). While numerous studies have explored delta parameter properties via operations like pruning, quantization, low-rank approximation, and extrapolation, a unified framework for systematically examining these characteristics has been lacking. In this paper, we propose a novel perspective based on Riemann sum approximation of the loss function to elucidate delta parameter editing operations. Our analysis categorizes existing methods into three classes based on their post-editing performance: competitive, decreased, and improved, explaining how they are expressed by the Riemann sum approximation term and how they alter the model performance. Extensive experiments on both visual and language models, including ViT, LLaMA 3, Qwen 2, and Mistral, corroborate our theoretical findings. Furthermore, we introduce extensions to existing techniques like DARE and BitDelta, highlighting their limitations in leveraging the properties of delta parameters and reorganizing them into general expressions to enhance the applicability and effectiveness of delta parameter editing in post-trained models. With the remarkable success of large-scale pre-trained models, post-training has emerged as the de facto standard paradigm for effective adaptations to various tasks (Han et al., 2024; Xin et al., 2024; Dodge et al., 2020; Zhao et al., 2023).

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-17-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.66)
  - Natural Language > Large Language Model (0.89)
  - Vision (1.00)