Review for NeurIPS paper: Minimax Classification with 0-1 Loss and Performance Guarantees

Neural Information Processing Systems 

This paper presents an interesting new perspective on the design of learning methods: the idea is to choose a classifier that minimizes the risk function uniformly over a family of distributions, constructed based on an iid data set, with the guarantee that (with high probability) the true data-generating distribution is contained in the family. This inherently supplies an upper bound on the risk of the chosen classifier. The family of distributions is generated by constraints on the expectation of a function Phi of (x,y), using data-dependent confidence bounds on its true expectation to set the constraints. Thus, the method is highly dependent on the choice of the function Phi. One significant concern noted by the reviewers is that the paper doesn't seem to explore this dependence in much depth, such as providing an array of illustrative examples and design principles for Phi, discussion of how choices of Phi for a given sample size may relate to notions of expressiveness and overfitting, or checking whether the technique can provide guarantees competitive with known results obtained by more traditional approaches (e.g., kernel methods, or ERM guarantees from uniform convergence).