Cross Validation
Response to Reviewer # 1: The error bars in Figure 2 correspond to the variance from the 5-fold cross-validation results
We thank all the reviewers for their valuable and positive feedback. All minor comments (like typos and notations) pointed out by reviewers will be addressed. There are error bars in one plot, but no mention I saw of what they indicate. The error bars in Figure 2 correspond to the variance from the 5-fold cross-validation results. Since the embeddings are created by predicting the attributes of other nodes in a path... Therefore, we have explicitly focused on molecules (e.g., pointed out in the The baselines also look weak.
Bayes beats Cross Validation: Fast and Accurate Ridge Regression via Expectation Maximization
We present a novel method for tuning the regularization hyper-parameter, ฮป, of a ridge regression that is faster to compute than leave-one-out cross-validation (LOOCV) while yielding estimates of the regression parameters of equal, or particularly in the setting of sparse covariates, superior quality to those obtained by minimising the LOOCV risk. The LOOCV risk can suffer from multiple and bad local minima for finite n and thus requires the specification of a set of candidate ฮป, which can fail to provide good solutions. In contrast, we show that the proposed method is guaranteed to find a unique optimal solution for large enough n, under relatively mild conditions, without requiring the specification of any difficult to determine hyper-parameters. This is based on a Bayesian formulation of ridge regression that we prove to have a unimodal posterior for large enough n, allowing for both the optimal ฮป and the regression coefficients to be jointly learned within an iterative expectation maximization (EM) procedure. Importantly, we show that by utilizing an appropriate preprocessing step, a single iteration of the main EM loop can be implemented in O(min(n, p)) operations, for input data with n rows and p columns.
Cross-validation Confidence Intervals for Test Error Pierre Bayle
This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for k-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller k-fold test error than another. These results are also the first of their kind for the popular choice of leave-one-out cross-validation. In our real-data experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature.
approximate cross-validation (ACV) methods may be slow and inaccurate in GLM problems with high data dimension
We are grateful to the reviewers for their helpful feedback. And we provide an efficiently computable upper bound on the error of our ACV method. We agree with R1 that we do not focus on asymptotic analysis. R1 suggests using principal components analysis (PCA) to reduce dimensionality of the covariate matrix. So, for (2), there is real interest in the full-covariate GLM.
Is Cross-validation the Gold Standard to Estimate Out-of-sample Model Performance?
Cross-Validation (CV) is the default choice for estimate the out-of-sample performance of machine learning models. Despite its wide usage, their statistical benefits have remained half-understood, especially in challenging nonparametric regimes. In this paper we fill in this gap and show that, in terms of estimating the out-of-sample performances, for a wide spectrum of models, CV does not statistically outperform the simple plug-in'' approach where one reuses training data for testing evaluation. Specifically, in terms of both the asymptotic bias and coverage accuracy of the associated interval for out-of-sample evaluation, K -fold CV provably cannot outperform plug-in regardless of the rate at which the parametric or nonparametric models converge. Leave-one-out CV can have a smaller bias as compared to plug-in; however, this bias improvement is negligible compared to the variability of the evaluation, and in some important cases leave-one-out again does not outperform plug-in once this variability is taken into account.
Cross-validation Confidence Intervals for Test Error
This work develops central limit theorems for cross-validation and consistent estimators of the asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for k-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller k-fold test error than another. These results are also the first of their kind for the popular choice of leave-one-out cross-validation. In our experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature.
Approximate Cross-Validation for Structured Models
Many modern data analyses benefit from explicitly modeling dependence structure in data -- such as measurements across time or space, ordered words in a sentence, or genes in a genome. A gold standard evaluation technique is structured cross-validation (CV), which leaves out some data subset (such as data within a time interval or data in a geographic region) in each fold. But CV here can be prohibitively slow due to the need to re-run already-expensive learning algorithms many times. Previous work has shown approximate cross-validation (ACV) methods provide a fast and provably accurate alternative in the setting of empirical risk minimization. But this existing ACV work is restricted to simpler models by the assumptions that (i) data across CV folds are independent and (ii) an exact initial model fit is available. In structured data analyses, both these assumptions are often untrue.
Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression
Models like LASSO and ridge regression are extensively used in practice due to their interpretability, ease of use, and strong theoretical guarantees. Cross-validation (CV) is widely used for hyperparameter tuning in these models, but do practical methods minimize the true out-of-sample loss? A recent line of research promises to show that the optimum of the CV loss matches the optimum of the out-of-sample loss (possibly after simple corrections). It remains to show how tractable it is to minimize the CV loss.In the present paper, we show that, in the case of ridge regression, the CV loss may fail to be quasiconvex and thus may have multiple local optima. We can guarantee that the CV loss is quasiconvex in at least one case: when the spectrum of the covariate matrix is nearly flat and the noise in the observed responses is not too high.
Review for NeurIPS paper: Cross-validation Confidence Intervals for Test Error
Weaknesses: Some major comments: 1) The connection to algorithmic stability is interesting, but I am not convinced that this can deliver as strong results as we would like beyond what can already be achieved through standard results/analysis. More specifically, algorithmic stability has mostly shown O(1/n) results for ERM or SGD, but this is just a rehashing of standard results, essentially following from iid-ness, that is, that every datapoint contributes the same information on average. This is not a problem with the current paper per se, but more a critique of algorithmic stability analysis. Rather, my concern for the current paper is twofold: a) the connection to algorithmic stability cannot deliver, as far as I understand, any stronger results than what is already possible through standard methods; b) and thus a basic CLT for CV error is attainable through a more standard analysis. Indeed, the path to asymptotic normality is pretty straightforward in the paper, since all important steps are more-or-less assumed: Square integrability of mean loss \bar h_n, song convexity of such loss function which guarantees O(1/n) rates, etc. 2) The experimental setup is very confusing to me.
Review for NeurIPS paper: Cross-validation Confidence Intervals for Test Error
The reviewers were all rather positive about the theoretical contribution, although one minority negative review (R1) gave a low score due an the experimental setup deemed unconvincing. Overall I recommend acceptance, possibly asking the authors to make some revisions to the experimental section to address some criticisms of R1.