Reviews: Population Matching Discrepancy and Applications in Deep Learning

Neural Information Processing Systems 

This paper presents the population matching discrepancy (PMD) as a better alternative to MMD for distribution matching applications. It is shown that PMD is a sampled version of Wasserstein metric or earth mover's distance, and it has a few advantages over MMD, most notably stronger gradients and the applicability of smaller mini-batch sizes, and fewer hyperparameters. For training generative models at least, the MMD metric does suffer from weak gradients and the requirement of large mini-batches, the proposals in this paper therefore provides a nice solution to both of these problems. The small mini-batch claim is verified quite nicely in the empirical results. The verification of the stronger gradients claim is less satisfactory, since the MMD metric depends on the scale parameter sigma, it is essential to consider either the best sigma or a range of sigmas when making such a claim. In terms of having fewer hyper-parameters, I feel this claim is less well-supported, because PMD depends on a distance metric, and this distance metric might contain extra hyperparameters as well as in the MMD case.