Volgushev, Stanislav
Universality of Benign Overfitting in Binary Linear Classification
Hashimoto, Ichiro, Volgushev, Stanislav, Zwiernik, Piotr
The practical success of deep learning has led to the discovery of several surprising phenomena. One of these phenomena, that has spurred intense theoretical research, is ``benign overfitting'': deep neural networks seem to generalize well in the over-parametrized regime even though the networks show a perfect fit to noisy training data. It is now known that benign overfitting also occurs in various classical statistical models. For linear maximum margin classifiers, benign overfitting has been established theoretically in a class of mixture models with very strong assumptions on the covariate distribution. However, even in this simple setting, many questions remain open. For instance, most of the existing literature focuses on the noiseless case where all true class labels are observed without errors, whereas the more interesting noisy case remains poorly understood. We provide a comprehensive study of benign overfitting for linear maximum margin classifiers. We discover a phase transition in test error bounds for the noisy model which was previously unknown and provide some geometric intuition behind it. We further considerably relax the required covariate assumptions in both, the noisy and noiseless case. Our results demonstrate that benign overfitting of maximum margin classifiers holds in a much wider range of scenarios than was previously known and provide new insights into the underlying mechanisms.
Group structure estimation for panel data -- a general approach
Yu, Lu, Gu, Jiaying, Volgushev, Stanislav
Panel data models are a standard empirical tool in statistics, economics, marketing, and financial research. The conventional modeling approach is to assume that all individual heterogeneity can be summarized by an individual specific intercept, often known as the fixed effects, while assuming all covariates have a common effect among all the individuals, such that information can be pooled across individuals to gain efficiency of these common parameters. However, heterogeneous responses towards observed control variables are often better supported by empirical evidence, especially as detailed individual level data becomes more available. An increasingly popular approach to model unobserved heterogeneity in the effects of covariates on individual responses is to assume the existence of a finite number of homogeneous groups.
An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias
Yu, Lu, Balasubramanian, Krishnakumar, Volgushev, Stanislav, Erdogdu, Murat A.
Structured non-convex learning problems, for which critical points have favorable statistical properties, arise frequently in statistical machine learning. Algorithmic convergence and statistical estimation rates are well-understood for such problems. However, quantifying the uncertainty associated with the underlying training algorithm is not well-studied in the non-convex setting. In order to address this shortcoming, in this work, we establish an asymptotic normality result for the constant step size stochastic gradient descent (SGD) algorithm--a widely used algorithm in practice. Specifically, based on the relationship between SGD and Markov Chains [DDB19], we show that the average of SGD iterates is asymptotically normally distributed around the expected value of their unique invariant distribution, as long as the non-convex and non-smooth objective function satisfies a dissipativity property. We also characterize the bias between this expected value and the critical points of the objective function under various local regularity conditions. Together, the above two results could be leveraged to construct confidence intervals for non-convex problems that are trained using the SGD algorithm.