Reviews: Minimax Statistical Learning with Wasserstein distances

Neural Information Processing Systems 

The paper investigates a minimax framework for statistical learning where the goal is to minimize the worst-case population risk over a family of distributions that are within a prescribed Wasserstein distance from the unknown data-generating distribution. The authors develop data-dependent generalization bound and data-independent excess risk bounds (using smoothness assumptions) in the setting where the classical empirical risk minimization (ERM) algorithm is replaced by a robust procedure that minimizes the worst-case empirical risk with respect to distributions contained in a Wasserstein ball centered around the data-generating empirical distribution. The statistical minimax framework investigated by the authors resembles in spirit the one introduced in [9], where the ambiguity set is defined via moment constraints instead of the Wasserstein distance. The paper is well-written, with accurate references to previous literature and an extensive use of remarks to guide the development of the theory. The contributions are clearly emphasized, and the math is solid.