On the Variance, Admissibility, and Stability of Empirical Risk Minimization
Kur, Gil, Putterman, Eli, Rakhlin, Alexander
–arXiv.org Artificial Intelligence
Maximum Likelihood and the method of Least Squares are fundamental procedures in statistics. The study of asymptotic consistency of Maximum Likelihood has been central to the field for almost a century (Wald, 1949). Along with consistency, failures of Maximum Likelihood have been thoroughly investigated for nearly as long (Neyman and Scott, 1948; Bahadur, 1958; Ferguson, 1982). In the context of estimation of nonparametric models, the seminal work of (Birgé and Massart, 1993) provided sufficient conditions for minimax optimality (in a non-asymptotic sense) of Least Squares while also presenting an example of a model class where this basic procedure is sub-optimal. Three decades later, we still do not have necessary and sufficient conditions for minimax optimality of Least Squares when the model class is large. While the present paper does not resolve this question, it makes several steps towards understanding the behavior of Least Squares -- equivalently, Empirical Risk Minimization (ERM) with square loss -- in large models. Beyond intellectual curiosity, the question of minimax optimality of Least Squares is driven by the desire to understand the current practice of fitting large or overparametrized models, such as neural networks, to data (cf.
arXiv.org Artificial Intelligence
May-29-2023
- Country:
- North America > United States
- Rhode Island > Providence County > Providence (0.04)
- Europe > United Kingdom
- England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- England
- Asia > Middle East
- Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.67)