consistency result
Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. (2020) for minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for near-minimal-norm interpolators in the special case of Gaussian data. We demonstrate the generality of the bound by applying it to the simplex, obtaining a novel consistency result for minimum $\ell_1$-norm interpolators (basis pursuit). Our results show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings.
To Reviewer 1
We thank the reviewers for the helpful comments and feedback. Our responses are detailed below. We will make the suggested edits for clarity. The improved interpretability with little loss of accuracy makes the sparse TBM appealing in applications. We agree with reviewer that MSE is not the best metric for clustering.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper studies the statistical consistency of plug in classifiers under non decomposable loss functions such as the F statistic which is a popular performance measure in machine learning. The problem studied in this paper is complex because non decomposable measures cannot, by definition, be expressed as an empirical expectation. Therefore, usual concentration inequalities are not applicable in this scenario. The authors present a general analysis for measures that can be expressed as a continuous function of the true positive rate and the true negative rate as well as the class probability.
On the Statistical Consistency of Plug-in Classifiers for Non-decomposable Performance Measures
We study consistency properties of algorithms for non-decomposable performance measures that cannot be expressed as a sum of losses on individual data points, such as the F-measure used in text retrieval and several other performance measures used in class imbalanced settings. While there has been much work on designing algorithms for such performance measures, there is limited understanding of the theoretical properties of these algorithms. Recently, Ye et al. (2012) showed consistency results for two algorithms that optimize the F-measure, but their results apply only to an idealized setting, where precise knowledge of the underlying probability distribution (in the form of the estimate' of the class probability, and provide a general methodology to show consistency of these methods for any non-decomposable measure that can be expressed as a continuous function of true positive rate (TPR) and true negative rate (TNR), and for which the Bayes optimal classifier is the class probability function thresholded suitably. We use this template to derive consistency results for plug-in algorithms for the F-measure and for the geometric mean of TPR and precision; to our knowledge, these are the first such results for these measures. In addition, for continuous distributions, we show consistency of plug-in algorithms for any performance measure that is a continuous and monotonically increasing function of TPR and TNR. Experimental results confirm our theoretical findings.
On the Statistical Consistency of Plug-in Classifiers for Non-decomposable Performance Measures
Harikrishna Narasimhan, Rohit Vaish, Shivani Agarwal
We study consistency properties of algorithms for non-decomposable performance measures that cannot be expressed as a sum of losses on individual data points, such as the F-measure used in text retrieval and several other performance measures used in class imbalanced settings. While there has been much work on designing algorithms for such performance measures, there is limited understanding of the theoretical properties of these algorithms. Recently, Ye et al. (2012) showed consistency results for two algorithms that optimize the F-measure, but their results apply only to an idealized setting, where precise knowledge of the underlying probability distribution (in the form of the'true' posterior class probability) is available to a learning algorithm. In this work, we consider plug-in algorithms that learn a classifier by applying an empirically determined threshold to a suitable'estimate' of the class probability, and provide a general methodology to show consistency of these methods for any non-decomposable measure that can be expressed as a continuous function of true positive rate (TPR) and true negative rate (TNR), and for which the Bayes optimal classifier is the class probability function thresholded suitably. We use this template to derive consistency results for plug-in algorithms for the F-measure and for the geometric mean of TPR and precision; to our knowledge, these are the first such results for these measures. In addition, for continuous distributions, we show consistency of plug-in algorithms for any performance measure that is a continuous and monotonically increasing function of TPR and TNR. Experimental results confirm our theoretical findings.
Reviews: Robust k-means: a Theoretical Revisit
In this paper the author studied theoretic properties of the robust k-means (RKM) formulation proposed in [5,23]. They first studied the robustness property, showing that if the f_\lambda function is convex, the one outlier is sufficient to break down the algorithm; and if f_\lambda need not be convex, then two outliers can breakdown the algorithm. On the other hand, under some structural assumptions on the non-outliers, then a non-trivial breakdown point can be established for RKM. The authors then study the consistency issue, generalising consistency results that are known for convex f_lambda to non convex f_\lambda. My main concern of the paper is that the results appear very specific and I am not entirely sure whether they will appeal to a more general audience in machine learning.
Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. (2020) for minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for near-minimal-norm interpolators in the special case of Gaussian data. We demonstrate the generality of the bound by applying it to the simplex, obtaining a novel consistency result for minimum \ell_1 -norm interpolators (basis pursuit). Our results show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings.