Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models
–Neural Information Processing Systems
We focus on the task of learning a single index model \sigma(w \star \cdot x) with respect to the isotropic Gaussian distribution in d dimensions. Prior work has shown that the sample complexity of learning w \star is governed by the information exponent k \star of the link function \sigma, which is defined as the index of the first nonzero Hermite coefficient of \sigma . Ben Arous et al. (2021) showed that n \gtrsim d {k \star-1} samples suffice for learning w \star and that this is tight for online SGD. However, the CSQ lower bound for gradient based methods only shows that n \gtrsim d {k \star/2} samples are necessary. In this work, we close the gap between the upper and lower bounds by showing that online SGD on a smoothed loss learns w \star with n \gtrsim d {k \star/2} samples.
Neural Information Processing Systems
Mar-16-2025, 03:00:51 GMT