waterbird
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.94)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Middle East > Oman (0.04)
- North America > United States > Texas (0.04)
- (4 more...)
- Transportation > Ground > Road (1.00)
- Transportation > Passenger (0.67)
- Oceania (0.04)
- Europe (0.04)
- Asia (0.04)
- (3 more...)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
On the Unreasonable Effectiveness of Last-layer Retraining
Hill, John C., LaBonte, Tyler, Zhang, Xinchen, Muthukumar, Vidya
Last-layer retraining (LLR) methods -- wherein the last layer of a neural network is reinitialized and retrained on a held-out set following ERM training -- have garnered interest as an efficient approach to rectify dependence on spurious correlations and improve performance on minority groups. Surprisingly, LLR has been found to improve worst-group accuracy even when the held-out set is an imbalanced subset of the training set. We initially hypothesize that this ``unreasonable effectiveness'' of LLR is explained by its ability to mitigate neural collapse through the held-out set, resulting in the implicit bias of gradient descent benefiting robustness. Our empirical investigation does not support this hypothesis. Instead, we present strong evidence for an alternative hypothesis: that the success of LLR is primarily due to better group balance in the held-out set. We conclude by showing how the recent algorithms CB-LLR and AFR perform implicit group-balancing to elicit a robustness improvement.
Shortcut Invariance: Targeted Jacobian Regularization in Disentangled Latent Space
Pal, Shivam, Varshney, Sakshi, Rai, Piyush
Deep neural networks are prone to learning shortcuts, spurious and easily learned correlations in training data that cause severe failures in out-of-distribution (OOD) generalization. A dominant line of work seeks robustness by learning a robust representation, often explicitly partitioning the latent space into core and spurious components; this approach can be complex, brittle, and difficult to scale. W e take a different approach: instead of a robust representation, we learn a robust function. W e present a simple and effective training method that renders the classifier functionally invariant to shortcut signals. Our method operates within a disentangled latent space, which is essential as it isolates spurious and core features into distinct dimensions. This separation enables the identification of candidate shortcut features by their strong correlation with the label, used as a proxy for semantic simplicity. The classifier is then desensitized to these features by injecting targeted, anisotropic latent noise during training. W e analyze this as targeted Jacobian regularization, which forces the classifier to ignore spurious features and rely on more complex, core semantic signals. The result is state-of-the-art OOD performance on established shortcut learning benchmarks.