Review for NeurIPS paper: Projection Robust Wasserstein Distance and Riemannian Optimization

Neural Information Processing Systems 

Summary and Contributions: The Wasserstein distance emerges from the optimal transport (OT) problem and is a powerful metric to compare two probability measures, since it offers nice theoretical properties and relevant practical implications. However, it has major limitations when applied in large-scale settings: since the Wasserstein distance is defined as the solution of a linear program, its computation becomes rapidly excessive as the dimension of the ambient data space increases; besides, its sample complexity can grow exponentially in the problem dimension. These unfavorable properties have motivated the development of "computational OT" methods in recent years, which define alternative to the Wasserstein distance with better computational and/or statistical properties, and therefore allow the use of OT in machine learning applications. One approach that was recently proposed and has become increasingly popular, consists in computing the Wasserstein distance between lower-dimensional representations for the two distributions to compare. Specifically, the Projection Robust Wasserstein (PRW) distance (also known as Wasserstein Projection Pursuit) builds the representations by projecting orthogonally the d-dimensional distributions into the k-dimensional subspace (k d) such that the Wasserstein distance between these k-dimensional reductions is maximized.