Manole, Tudor
Stability Bounds for Smooth Optimal Transport Maps and their Statistical Implications
Balakrishnan, Sivaraman, Manole, Tudor
We study estimators of the optimal transport (OT) map between two probability distributions. We focus on plugin estimators derived from the OT map between estimates of the underlying distributions. We develop novel stability bounds for OT maps which generalize those in past work, and allow us to reduce the problem of optimally estimating the transport map to that of optimally estimating densities in the Wasserstein distance. In contrast, past work provided a partial connection between these problems and relied on regularity theory for the Monge-Ampere equation to bridge the gap, a step which required unnatural assumptions to obtain sharp guarantees. We also provide some new insights into the connections between stability bounds which arise in the analysis of plugin estimators and growth bounds for the semi-dual functional which arise in the analysis of Brenier potential-based estimators of the transport map. We illustrate the applicability of our new stability bounds by revisiting the smooth setting studied by Manole et al., analyzing two of their estimators under more general conditions. Critically, our bounds do not require smoothness or boundedness assumptions on the underlying measures. As an illustrative application, we develop and analyze a novel tuning parameter-free estimator for the OT map between two strongly log-concave distributions.
Refined Convergence Rates for Maximum Likelihood Estimation under Finite Mixture Models
Manole, Tudor, Ho, Nhat
We revisit convergence rates for maximum likelihood estimation (MLE) under finite mixture models. The Wasserstein distance has become a standard loss function for the analysis of parameter estimation in these models, due in part to its ability to circumvent label switching and to accurately characterize the behaviour of fitted mixture components with vanishing weights. However, the Wasserstein metric is only able to capture the worst-case convergence rate among the remaining fitted mixture components. We demonstrate that when the log-likelihood function is penalized to discourage vanishing mixing weights, stronger loss functions can be derived to resolve this shortcoming of the Wasserstein distance. These new loss functions accurately capture the heterogeneity in convergence rates of fitted mixture components, and we use them to sharpen existing pointwise and uniform convergence rates in various classes of mixture models. In particular, these results imply that a subset of the components of the penalized MLE typically converge significantly faster than could have been anticipated from past work. We further show that some of these conclusions extend to the traditional MLE. Our theoretical findings are supported by a simulation study to illustrate these improved convergence rates.
Plugin Estimation of Smooth Optimal Transport Maps
Manole, Tudor, Balakrishnan, Sivaraman, Niles-Weed, Jonathan, Wasserman, Larry
We analyze a number of natural estimators for the optimal transport map between two distributions and show that they are minimax optimal. We adopt the plugin approach: our estimators are simply optimal couplings between measures derived from our observations, appropriately extended so that they define functions on $\mathbb{R}^d$. When the underlying map is assumed to be Lipschitz, we show that computing the optimal coupling between the empirical measures, and extending it using linear smoothers, already gives a minimax optimal estimator. When the underlying map enjoys higher regularity, we show that the optimal coupling between appropriate nonparametric density estimates yields faster rates. Our work also provides new bounds on the risk of corresponding plugin estimators for the quadratic Wasserstein distance, and we show how this problem relates to that of estimating optimal transport maps using stability arguments for smooth and strongly convex Brenier potentials. As an application of our results, we derive a central limit theorem for a density plugin estimator of the squared Wasserstein distance, which is centered at its population counterpart when the underlying distributions have sufficiently smooth densities. In contrast to known central limit theorems for empirical estimators, this result easily lends itself to statistical inference for Wasserstein distances.
Sequential Estimation of Convex Divergences using Reverse Submartingales and Exchangeable Filtrations
Manole, Tudor, Ramdas, Aaditya
We present a unified technique for sequential estimation of convex divergences between distributions, including integral probability metrics like the kernel maximum mean discrepancy, $\varphi$-divergences like the Kullback-Leibler divergence, and optimal transport costs, such as powers of Wasserstein distances. The technical underpinnings of our approach lie in the observation that empirical convex divergences are (partially ordered) reverse submartingales with respect to the exchangeable filtration, coupled with maximal inequalities for such processes. These techniques appear to be powerful additions to the existing literature on both confidence sequences and convex divergences. We construct an offline-to-sequential device that converts a wide array of existing offline concentration inequalities into time-uniform confidence sequences that can be continuously monitored, providing valid inference at arbitrary stopping times. The resulting sequential bounds pay only an iterated logarithmic price over the corresponding fixed-time bounds, retaining the same dependence on problem parameters (like dimension or alphabet size if applicable).
Uniform Convergence Rates for Maximum Likelihood Estimation under Two-Component Gaussian Mixture Models
Manole, Tudor, Ho, Nhat
Finite mixture models are a widely-used tool for modeling heterogeneous data, consisting of hidden subpopulations with distinct distributions. For applications exhibiting continuous data, location-scale Gaussian mixtures are arguably the most popular family of parametric mixture models. Beyond their broad applications as a modeling and clustering tool in the social, physical and life sciences (McLachlan & Peel 2004), Gaussian mixtures provide a flexible approach to density estimation (Genovese & Wasserman 2000, Ghosal & van der Vaart 2001). Estimating the parameters of a mixture model is crucial for quantifying the underlying heterogeneity of the data. One of the most widely-used approaches is the maximum likelihood estimator (MLE). A Gaussian mixture model with a known number of components K, all of which are well-separated, forms a regular parametric model for which the MLE achieves the standard parametric estimation rate (Ho & Nguyen 2016b, Chen 2017). Such rates are typically understood in terms of convergence of mixing measures, quantified using the Wasserstein distance as a means of avoiding label switching issues inherent in mixture modeling (Nguyen 2013).
Estimating the Number of Components in Finite Mixture Models via the Group-Sort-Fuse Procedure
Manole, Tudor, Khalili, Abbas
Estimation of the number of components (or order) of a finite mixture model is a long standing and challenging problem in statistics. We propose the Group-Sort-Fuse (GSF) procedure---a new penalized likelihood approach for simultaneous estimation of the order and mixing measure in multidimensional finite mixture models. Unlike methods which fit and compare mixtures with varying orders using criteria involving model complexity, our approach directly penalizes a continuous function of the model parameters. More specifically, given a conservative upper bound on the order, the GSF groups and sorts mixture component parameters to fuse those which are redundant. For a wide range of finite mixture models, we show that the GSF is consistent in estimating the true mixture order and achieves the $n^{-1/2}$ convergence rate for parameter estimation up to polylogarithmic factors. The GSF is implemented for several univariate and multivariate mixture models in the R package GroupSortFuse. Its finite sample performance is supported by a thorough simulation study, and its application is illustrated on two real data examples.
Minimax Confidence Intervals for the Sliced Wasserstein Distance
Manole, Tudor, Balakrishnan, Sivaraman, Wasserman, Larry
September 18, 2019 Abstract The Wasserstein distance has risen in popularity in the statistics and machine learning communities as a useful metric for comparing probability distributions. We study the problem of uncertainty quantification for the Sliced Wasserstein distance--an easily computable approximation of the Wasserstein distance. Specifically, we construct confidence intervals for the Sliced Wasserstein distance which have finite-sample validity under no assumptions or mild moment assumptions, and are adaptive in length to the smoothness of the underlying distributions. We also bound the minimax risk of estimating the Sliced Wasserstein distance, and show that the length of our proposed confidence intervals is minimax optimal over appropriate distribution classes. To motivate the choice of these classes, we also study minimax rates of estimating a distribution under the Sliced Wasserstein distance. These theoretical findings are complemented with a simulation study. 1 Introduction The Wasserstein distance is a metric between probability distributions which has received a surge of interest in statistics and machine learning (Panaretos and Zemel, 2018; Kolouri et al., 2017). This distance is a special case of the optimal transport problem (Villani, 2003), and measures the work required to couple one distribution with another.