LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views

Roh, Yuji, Liu, Qingyun, Gui, Huan, Yuan, Zhe, Tang, Yujin, Whang, Steven Euijong, Liu, Liang, Bi, Shuchao, Hong, Lichan, Chi, Ed H., Zhao, Zhe

Feb-7-2024–arXiv.org Artificial Intelligence

Fine-tuning is becoming widely used for leveraging the power of pre-trained foundation models in new downstream tasks. While there are many successes of fine-tuning on various tasks, recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions (i.e., out-of-distribution; OOD). To improve OOD generalization, some previous studies identify the limitations of fine-tuning data and regulate fine-tuning to preserve the general representation learned from pre-training data. However, potential limitations in the pre-training data and models are often ignored. In this paper, we contend that overly relying on the pre-trained representation may hinder fine-tuning from learning essential representations for downstream tasks and thus hurt its OOD generalization. It can be especially catastrophic when new tasks are from different (sub)domains compared to pre-training data. To address the issues in both pre-training and fine-tuning data, we propose a novel generalizable fine-tuning method LEVI, where the pre-trained model is adaptively ensembled layer-wise with a small task-specific model, while preserving training and inference efficiencies. By combining two complementing models, LEVI effectively suppresses problematic features in both the fine-tuning data and pre-trained model and preserves useful features for new tasks. Broad experiments with large language and vision models show that LEVI greatly improves fine-tuning generalization via emphasizing different views from fine-tuning data and pre-trained features.

fine-tuning data, levi, pre-trained model, (14 more...)

arXiv.org Artificial Intelligence

Feb-7-2024

arXiv.org PDF

Add feedback

Country:
- Europe (0.04)

Genre:
- Research Report (1.00)

Industry:
- Leisure & Entertainment (1.00)
- Media > Film (0.93)
- Health & Medicine > Therapeutic Area (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Personal Assistant Systems (0.93)
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)