generalization of SGD in SCO is well-established, and we are left with the question of how well can we account for generalization through an investigation of its bias
–Neural Information Processing Systems
One of the great mysteries of contemporary machine learning is the impressive success of unregularized and overparameterized learning algorithms. In detail, current machine learning practice is to train models with far more parameters than samples and let the algorithm fit the data, oftentimes without any type of regularization. In fact, these algorithms are so overcapacitated that they can even memorize and fit random data (Neyshabur et al., 2015; Zhang et al., 2017). Yet, when trained on real-life data, these algorithms show remarkable performance in generalizing to unseen samples. This phenomenon is often attributed to what is described as the implicit-regularization of an algorithm (Neyshabur et al., 2015). Implicit regularization roughly refers to the learner's preference to implicitly choosing certain structured solutions as if some explicit regularization term appeared in its objective. As a canonical example, in linear optimization one can show that various forms of gradient descent, an apriori unregularized algorithm, behaves identically as regularized risk minimization penalized with the squared Euclidean norm on the parameters (Cesa-Bianchi and Lugosi, 2006). Understanding implicit regularization poses several interesting challenges. For example: how can we find the implicit bias of a given learning algorithm?
Neural Information Processing Systems
May-29-2025, 09:52:13 GMT