A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias
Trivedi, Puja, Koutra, Danai, Thiagarajan, Jayaraman J.
–arXiv.org Artificial Intelligence
Advances in the expressivity of pretrained models have increased interest in the design of adaptation protocols which enable safe and effective transfer learning. Going beyond conventional linear probing (LP) and fine tuning (FT) strategies, protocols that can effectively control feature distortion, i.e., the failure to update features orthogonal to the in-distribution, have been found to achieve improved outof-distribution generalization (OOD). In order to limit this distortion, the LP+FT protocol, which first learns a linear probe and then uses this initialization for subsequent FT, was proposed. However, in this paper, we find when adaptation protocols (LP, FT, LP+FT) are also evaluated on a variety of safety objectives (e.g., calibration, robustness, etc.), a complementary perspective to feature distortion is helpful to explain protocol behavior. To this end, we study the susceptibility of protocols to simplicity bias (SB), i.e. the well-known propensity of deep neural networks to rely upon simple features, as SB has recently been shown to underlie several problems in robust generalization. Using a synthetic dataset, we demonstrate the susceptibility of existing protocols to SB. Given the strong effectiveness of LP+FT, we then propose modified linear probes that help mitigate SB, and lead to better initializations for subsequent FT. We verify the effectiveness of the proposed LP+FT variants for decreasing SB in a controlled setting, and their ability to improve OOD generalization and safety on three adaptation datasets. Indeed, representations from such high-quality SSL models have been found to be more robust (Hendrycks et al., 2019; Liu et al., 2021), transferable (Ericsson et al., 2021) and semantically consistent (Caron et al., 2021) than their supervised counterparts. In this regard, there is growing need for adaptation protocols that explicitly capitalize on these improved pretrained features to induce similar beneficial properties, e.g., Figure 1: Strong and Safe Adaptation. Recently, however, Kumar et al. (2022) proved that by modifying features only in the ID representation subspace, FT can lead to higher OOD error as it distorts directions outside the ID subspace that are needed for OOD generalization. As both ID and OOD subspaces are represented by the pretrained model, Kumar et al. demonstrate that limiting feature distortion, or controlling updates towards the ID subspace, can lead to improved ID and OOD performance.
arXiv.org Artificial Intelligence
Mar-23-2023
- Country:
- North America > United States (0.67)
- Genre:
- Research Report (1.00)
- Industry:
- Government > Regional Government (0.46)
- Technology: