On Wasserstein Two Sample Testing and Related Families of Nonparametric Tests

Ramdas, Aaditya, Garcia, Nicolas, Cuturi, Marco

arXiv.org Machine Learning 

Nonparametric two sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being intelligently designed and analyzed, both for the unidimensional and the multivariate setting. Our contribution is to tie together many of these tests, drawing connections between seemingly very different statistics. In this work, our central object is the Wasserstein distance, as we form a chain of connections from univariate methods like the Kolmogorov-Smirnov test, PP/QQ plots and ROC/ODC curves, to multivariate tests involving energy statistics and kernel based maximum mean discrepancy. Some connections proceed through the construction of a \textit{smoothed} Wasserstein distance, and others through the pursuit of a "distribution-free" Wasserstein test. Some observations in this chain are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two sample testing's classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found