approaching bayes error
Review for NeurIPS paper: Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization
Specifically their rigorous results concern the model: for i 1,2,...,n Y_i sign( X_i,w *) for Gaussian prior on w * and Gaussian X_i where they live on d dimensions. They assume n/d \alpha (constant) and n,d grow to infinity. In [10] the Bayes optimal reconstruction error has been studied (verifying a stats physics prediction) and here they discuss about the performance of regularize ERM (potentially convex methods) to achieve it. Their first set of results is about the performance of any \ell_2 regularized convex loss and showing that their performance can be tracked using a fixed point equation. The result is based on Gordon's minimax theory and is shown then to be verifying also the replica (stats physics) prediction.
Review for NeurIPS paper: Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization
All the reviewers agreed that the main results presented in this paper, the rigorous fixed-point equations for binary classification with generic loss and l2 regularizer, and more in-depth elucidation for three losses (ridge, hinge, and logistic), are sound and interesting. Although the problem setting may be thought as simple and limited, the findings in this paper are rigorous and non-trivial, which is the strength of this paper. In this regard, clarification on what statements are rigorous and what are not should be important. Some reviewers pointed out that it would be nicer if a general criterion telling if a particular loss would achieve the rate \propto \alpha {-1} be provided. I think that this point would be worth mentioning in this paper.
Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization
We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer non-linear neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where \alpha \frac{n}{d} is kept finite in the limit of a high dimension d and number of samples n . Our contribution is three-fold: First, we prove a formula for the generalization error achieved by \ell_2 regularized classifiers that minimize a convex loss. This formula was first obtained by the heuristic replica method of statistical physics. Secondly, focussing on commonly used loss functions and optimizing the \ell_2 regularization strength, we observe that while ridge regression performance is poor, logistic and hinge regression are surprisingly able to approach the Bayes-optimal generalization error extremely closely.