Goto

Collaborating Authors

 regularized wasserstein distance


Agglomerative Clustering of Simulation Output Distributions Using Regularized Wasserstein Distance

arXiv.org Machine Learning

We investigate the use of clustering methods on data produced by a stochastic simulator, with applications in anomaly detection, pre-optimization, and online monitoring. We introduce an agglomerative clustering algorithm that clusters multivariate empirical distributions using the regularized Wasserstein distance and apply the proposed methodology on a call-center model.


A Bi-level Nonlinear Eigenvector Algorithm for Wasserstein Discriminant Analysis

arXiv.org Artificial Intelligence

As widely used feature extraction approaches in machine learning, dimensionality reduction (DR) methods [53, 7, 20, 12] learn projections such that the projected lower dimensional subspaces maintain the coherent structure of datasets and reduce computational costs of classification or clustering. The linear projection obtained from linear DR methods takes the form of a matrix such that the embedding to the lower dimensional subspace only involves matrix multiplications. Due to such ease in interpretation and implementation, linear DR methods are often the favored choice among numerous DR methods. For example, principal component analysis (PCA) [24] seeks to find a linear projection that preserves the dataset's variation and is one of the most common and well-known DR methods. Other well-known DR methods include Fisher linear discriminant analysis (LDA) [24] to take into account the information of classes and compute a linear projection that best separates different classes, and Mahalanobis metric learning [35] to seek a distance metric that better models the relationship among dataset from a linear projection. Wasserstein discriminant analysis (WDA) [19] is a supervised linear DR that is based on the use of regularized Wasserstein distances [13] as a distance metric. Similar to Fisher linear discriminant analysis (LDA), WDA seeks a projection matrix to maximize the dispersion of projected points between different classes and minimize the dispersion of projected points within same classes.


Convergence and finite sample approximations of entropic regularized Wasserstein distances in Gaussian and RKHS settings

arXiv.org Machine Learning

This work studies the convergence and finite sample approximations of entropic regularized Wasserstein distances in the Hilbert space setting. Our first main result is that for Gaussian measures on an infinite-dimensional Hilbert space, convergence in the 2-Sinkhorn divergence is {\it strictly weaker} than convergence in the exact 2-Wasserstein distance. Specifically, a sequence of centered Gaussian measures converges in the 2-Sinkhorn divergence if the corresponding covariance operators converge in the Hilbert-Schmidt norm. This is in contrast to the previous known result that a sequence of centered Gaussian measures converges in the exact 2-Wasserstein distance if and only if the covariance operators converge in the trace class norm. In the reproducing kernel Hilbert space (RKHS) setting, the {\it kernel Gaussian-Sinkhorn divergence}, which is the Sinkhorn divergence between Gaussian measures defined on an RKHS, defines a semi-metric on the set of Borel probability measures on a Polish space, given a characteristic kernel on that space. With the Hilbert-Schmidt norm convergence, we obtain {\it dimension-independent} convergence rates for finite sample approximations of the kernel Gaussian-Sinkhorn divergence, with the same order as the Maximum Mean Discrepancy. These convergence rates apply in particular to Sinkhorn divergence between Gaussian measures on Euclidean and infinite-dimensional Hilbert spaces. The sample complexity for the 2-Wasserstein distance between Gaussian measures on Euclidean space, while dimension-dependent and larger than that of the Sinkhorn divergence, is exponentially faster than the worst case scenario in the literature.


Learning to Recommend via Inverse Optimal Matching

arXiv.org Machine Learning

We consider recommendation in the context of optimal matching, i.e., we need to pair or match a user with an item in an optimal way. The framework is particularly relevant when the supply of an individual item is limited and it can only satisfy a small number of users even though it may be preferred by many. We leverage the methodology of optimal transport of discrete distributions and formulate an inverse optimal transport problem in order to learn the cost which gives rise to the observed matching. It leads to a non-convex optimization problem which is solved by alternating optimization. A key novel aspect of our formulation is the incorporation of marginal relaxation via regularized Wasserstein distance, significantly improving the robustness of the method in the face of observed empirical matchings. Our model has wide applicability including labor market, online dating, college application recommendation. We back up our claims with experiments on both synthetic data and real world datasets.


Wasserstein Discriminant Analysis

arXiv.org Machine Learning

Wasserstein Discriminant Analysis (WDA) is a new supervised method that can improve classification of high-dimensional data by computing a suitable linear map onto a lower dimensional subspace. Following the blueprint of classical Linear Discriminant Analysis (LDA), WDA selects the projection matrix that maximizes the ratio of two quantities: the dispersion of projected points coming from different classes, divided by the dispersion of projected points coming from the same class. To quantify dispersion, WDA uses regularized Wasserstein distances, rather than cross-variance measures which have been usually considered, notably in LDA. Thanks to the the underlying principles of optimal transport, WDA is able to capture both global (at distribution scale) and local (at samples scale) interactions between classes. Regularized Wasserstein distances can be computed using the Sinkhorn matrix scaling algorithm; We show that the optimization of WDA can be tackled using automatic differentiation of Sinkhorn iterations. Numerical experiments show promising results both in terms of prediction and visualization on toy examples and real life datasets such as MNIST and on deep features obtained from a subset of the Caltech dataset.