Goto

Collaborating Authors

 Lai, Kuo-Wei


Task Shift: From Classification to Regression in Overparameterized Linear Models

arXiv.org Machine Learning

Modern machine learning methods have recently demonstrated remarkable capability to generalize under task shift, where latent knowledge is transferred to a different, often more difficult, task under a similar data distribution. We investigate this phenomenon in an overparameterized linear regression setting where the task shifts from classification during training to regression during evaluation. In the zero-shot case, wherein no regression data is available, we prove that task shift is impossible in both sparse signal and random signal models for any Gaussian covariate distribution. In the few-shot case, wherein limited regression data is available, we propose a simple postprocessing algorithm which asymptotically recovers the ground-truth predictor. Our analysis leverages a fine-grained characterization of individual parameters arising from minimum-norm interpolation which may be of independent interest. Our results show that while minimum-norm interpolators for classification cannot transfer to regression a priori, they experience surprisingly structured attenuation which enables successful task shift with limited additional data.


Sharp analysis of out-of-distribution error for "importance-weighted" estimators in the overparameterized regime

arXiv.org Machine Learning

Overparameterized models are ubiquitous in machine learning theory and practice today because of their state-of-the-art generalization guarantees (in the sense of low test error) even while perfectly fitting the training data [30, 7]. However, this "good generalization" property does not extend to test data that is distributed differently from training data, termed out-of-distribution (OOD) data [20, 21, 29]. A particularly acute scenario arises when the data is drawn as a mixture from multiple groups (each with a different distribution) and some groups are very under-represented in training data [2]. Under such models, the worst-group generalization error can be significantly degraded with respect to the average generalization error on all groups [1, 27, 21, 20]. The effect of distribution shift on generalization has been sharply characterized in a worst-case/minimax sense, e.g.


General Loss Functions Lead to (Approximate) Interpolation in High Dimensions

arXiv.org Artificial Intelligence

We provide a unified framework, applicable to a general family of convex losses and across binary and multiclass settings in the overparameterized regime, to approximately characterize the implicit bias of gradient descent in closed form. Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm interpolation in high dimensions, which arises from training on the squared loss. In contrast to prior work which was tailored to exponentially-tailed losses and used the intermediate support-vector-machine formulation, our framework directly builds on the primal-dual analysis of Ji and Telgarsky (2021), allowing us to provide new approximate equivalences for general convex losses through a novel sensitivity analysis. Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings. Finally, we provide evidence for the tightness of our techniques, which we use to demonstrate the effect of certain loss functions designed for out-of-distribution problems on the closed-form solution.