Review for NeurIPS paper: Cross-validation Confidence Intervals for Test Error

Neural Information Processing Systems 

Weaknesses: Some major comments: 1) The connection to algorithmic stability is interesting, but I am not convinced that this can deliver as strong results as we would like beyond what can already be achieved through standard results/analysis. More specifically, algorithmic stability has mostly shown O(1/n) results for ERM or SGD, but this is just a rehashing of standard results, essentially following from iid-ness, that is, that every datapoint contributes the same information on average. This is not a problem with the current paper per se, but more a critique of algorithmic stability analysis. Rather, my concern for the current paper is twofold: a) the connection to algorithmic stability cannot deliver, as far as I understand, any stronger results than what is already possible through standard methods; b) and thus a basic CLT for CV error is attainable through a more standard analysis. Indeed, the path to asymptotic normality is pretty straightforward in the paper, since all important steps are more-or-less assumed: Square integrability of mean loss \bar h_n, song convexity of such loss function which guarantees O(1/n) rates, etc. 2) The experimental setup is very confusing to me.