On the Minimal Error of Empirical Risk Minimization
An increasing number of machine learning applications employ flexible overparameterized models to fit the training data. Theoretical analysis of such'overfitted' solutions has been a recent focus of the learning community. It is conjectured that the use of large overparameterized neural networks makes the loss landscape amenable to optimization through local search methods, such as stochastic gradient descent. It is also hypothesized that implicit regularization, arising from the choice of the optimization algorithm and the neural network architecture, mitigates the large complexity and ensures that the'overfitted' solutions generalize. Suppose a'simple' class H of models captures the relationship between the covariates X and the response variable Y. Inspired by the use of overparameterized models, we may take a much larger class F H for computational or other purposes (such as lack of explicit description of H) and minimize training loss over this larger class.
Feb-23-2021