On the Variance, Admissibility, and Stability of Empirical Risk Minimization

Kur, Gil, Putterman, Eli, Rakhlin, Alexander

May-29-2023–arXiv.org Artificial Intelligence

Maximum Likelihood and the method of Least Squares are fundamental procedures in statistics. The study of asymptotic consistency of Maximum Likelihood has been central to the field for almost a century (Wald, 1949). Along with consistency, failures of Maximum Likelihood have been thoroughly investigated for nearly as long (Neyman and Scott, 1948; Bahadur, 1958; Ferguson, 1982). In the context of estimation of nonparametric models, the seminal work of (Birgé and Massart, 1993) provided sufficient conditions for minimax optimality (in a non-asymptotic sense) of Least Squares while also presenting an example of a model class where this basic procedure is sub-optimal. Three decades later, we still do not have necessary and sufficient conditions for minimax optimality of Least Squares when the model class is large. While the present paper does not resolve this question, it makes several steps towards understanding the behavior of Least Squares -- equivalently, Empirical Risk Minimization (ERM) with square loss -- in large models. Beyond intellectual curiosity, the question of minimax optimality of Least Squares is driven by the desire to understand the current practice of fitting large or overparametrized models, such as neural networks, to data (cf.

artificial intelligence, assumption, machine learning, (18 more...)

arXiv.org Artificial Intelligence

May-29-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Rhode Island > Providence County > Providence (0.04)
- Europe > United Kingdom
  - England
    - Cambridgeshire > Cambridge (0.04)
    - Oxfordshire > Oxford (0.04)
- Asia > Middle East
  - Israel > Tel Aviv District > Tel Aviv (0.04)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.74)
  - Machine Learning
    - Statistical Learning (0.68)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.74)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found