Optimal Transport for structured data

arXiv.org Machine Learning

Optimal transport has recently gained a lot of interest in the machine learning community thanks to its ability to compare probability distributions while respecting the underlying space's geometry. Wasserstein distance deals with feature information through its metric or cost function, but fails in exploiting the structural information, i.e the specific relations existing among the components of the distribution. Recently adapted to a machine learning context, the Gromov-Wasserstein distance defines a metric well suited for comparing distributions that live in different metric spaces by exploiting their inner structural information. In this paper we propose a new optimal transport distance, called the Fused Gromov-Wasserstein distance, capable of leveraging both structural and feature information by combining both views and prove its metric properties over very general manifolds. We also define the barycenter of structured objects as their Fr\'echet mean, leveraging both feature and structural information. We illustrate the versatility of the method for problems where structured objects are involved, computing barycenters in graph and time series contexts. We also use this new distance for graph classification where we obtain comparable or superior results than state-of-the-art graph kernel methods and end-to-end graph CNN approach.


Computationally Efficient Tree Variants of Gromov-Wasserstein

arXiv.org Machine Learning

We propose two novel variants of Gromov-Wasserstein (GW) between probability measures in different probability spaces based on projecting these measures into the tree metric spaces. Our first proposed discrepancy, named \emph{flow-based tree Gromov-Wasserstein}, hinges upon the tree metric from node to root in each tree to define the structure representation of probability measures on trees. The flow-based tree GW shares similar structures with univariate Wasserstein distance while keeping sufficient spatial information of the original projected probability measures. In order to further explore the structure of tree, we proposed another version of flow-based tree GW, which we refer to as \emph{depth-based tree Gromov-Wasserstein}. That discrepancy considers the alignment of probability measures hierarchically along each depth level of the tree structures. Finally, we demonstrate via extensive simulation studies on large-scale real data sets the relative advantage of the proposed discrepancies.


Sliced Gromov-Wasserstein

arXiv.org Machine Learning

Recently used in various machine learning contexts, the Gromov-Wasserstein distance (GW) allows for comparing distributions that do not necessarily lie in the same metric space. However, this Optimal Transport (OT) distance requires solving a complex non convex quadratic program which is most of the time very costly both in time and memory. Contrary to GW, the Wasserstein distance (W) enjoys several properties (e.g. duality) that permit large scale optimization. Among those, the Sliced Wasserstein (SW) distance exploits the direct solution of W on the line, that only requires sorting discrete samples in 1D. This paper propose a new divergence based on GW akin to SW. We first derive a closed form for GW when dealing with 1D distributions, based on a new result for the related quadratic assignment problem. We then define a novel OT discrepancy that can deal with large scale distributions via a slicing approach and we show how it relates to the GW distance while being $O(n^2)$ to compute. We illustrate the behavior of this so called Sliced Gromov-Wasserstein (SGW) discrepancy in experiments where we demonstrate its ability to tackle similar problems as GW while being several order of magnitudes faster to compute


Learning Generative Models across Incomparable Spaces

arXiv.org Machine Learning

Generative Adversarial Networks have shown remarkable success in learning a distribution that faithfully recovers a reference distribution in its entirety. However, in some cases, we may want to only learn some aspects (e.g., cluster or manifold structure), while modifying others (e.g., style, orientation or dimension). In this work, we propose an approach to learn generative models across such incomparable spaces, and demonstrate how to steer the learned distribution towards target properties. A key component of our model is the Gromov-Wasserstein distance, a notion of discrepancy that compares distributions relationally rather than absolutely. While this framework subsumes current generative models in identically reproducing distributions, its inherent flexibility allows application to tasks in manifold learning, relational learning and cross-domain learning.


A polynomial-time relaxation of the Gromov-Hausdorff distance

arXiv.org Machine Learning

The Gromov-Hausdorff distance provides a metric on the set of isometry classes of compact metric spaces. Unfortunately, computing this metric directly is believed to be computationally intractable. Motivated by applications in shape matching and point-cloud comparison, we study a semidefinite programming relaxation of the Gromov-Hausdorff metric. This relaxation can be computed in polynomial time, and somewhat surprisingly is itself a pseudometric. We describe the induced topology on the set of compact metric spaces. Finally, we demonstrate the numerical performance of various algorithms for computing the relaxed distance and apply these algorithms to several relevant data sets. In particular we propose a greedy algorithm for finding the best correspondence between finite metric spaces that can handle hundreds of points.