DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation
Oh, Changdae, Li, Yixuan, Song, Kyungwoo, Yun, Sangdoo, Han, Dongyoon
–arXiv.org Artificial Intelligence
Adapting a pre-trained foundation model on downstream tasks should ensure robustness against distribution shifts without the need to retrain the whole model. Although existing weight interpolation methods are simple yet effective, we argue their static nature limits downstream performance while achieving efficiency. In this work, we propose DaWin, a training-free dynamic weight interpolation method that leverages the entropy of individual models over each unlabeled test sample to assess model expertise, and compute per-sample interpolation coefficients dynamically. Unlike previous works that typically rely on additional training to learn such coefficients, our approach requires no training. Then, we propose a mixture modeling approach that greatly reduces inference overhead raised by dynamic interpolation. We validate DaWin on the large-scale visual recognition benchmarks, spanning 14 tasks across robust fine-tuning - ImageNet and derived five distribution shift benchmarks - and multi-task learning with eight classification tasks. Results demonstrate that DaWin achieves significant performance gain in considered settings, with minimal computational overhead. We further discuss DaWin's analytic behavior to explain its empirical success. The emergence of foundation models (Bommasani et al., 2021; Radford et al., 2021; Brown et al., 2020) has significantly lowered the barrier to deploying artificial intelligence solutions across a wide range of real-world problems. Leveraging the strong general knowledge acquired through large-scale pre-training, foundation models can be efficiently adapted for numerous tasks. However, recent studies have shown that while fine-tuning improves performance on specific downstream tasks, it may often undermine the model's generalizability and robustness (Wortsman et al., 2022b). For example, a model fine-tuned on ImageNet has better accuracy on in-distribution (ID) data yet may underperform in out-of-distribution (OOD) data such as ImageNet-A (Hendrycks et al., 2021b).
arXiv.org Artificial Intelligence
Oct-3-2024
- Country:
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Genre:
- Research Report (1.00)