Goto

Collaborating Authors

Recovering metric from full ordinal information

arXiv.org Machine Learning

Given a geodesic space (E, d), we show that full ordinal knowledge on the metric d-i.e. knowledge of the function D d : (w, x, y, z) $\rightarrow$ 1 d(w,x)$\le$d(y,z) , determines uniquely-up to a constant factor-the metric d. For a subspace En of n points of E, converging in Hausdorff distance to E, we construct a metric dn on En, based only on the knowledge of D d on En and establish a sharp upper bound of the Gromov-Hausdorff distance between (En, dn) and (E, d).


MREC: a fast and versatile framework for aligning and matching point clouds with applications to single cell molecular data

arXiv.org Machine Learning

Comparing and aligning large datasets is a pervasive problem occurring across many different knowledge domains. We introduce and study MREC, a recursive decomposition algorithm for computing matchings between data sets. The basic idea is to partition the data, match the partitions, and then recursively match the points within each pair of identified partitions. The matching itself is done using black box matching procedures that are too expensive to run on the entire data set. Using an absolute measure of the quality of a matching, the framework supports optimization over parameters including partitioning procedures and matching algorithms. By design, MREC can be applied to extremely large data sets. We analyze the procedure to describe when we can expect it to work well and demonstrate its flexibility and power by applying it to a number of alignment problems arising in the analysis of single cell molecular data.


Improved Error Bounds for Tree Representations of Metric Spaces

Neural Information Processing Systems

Estimating optimal phylogenetic trees or hierarchical clustering trees from metric data is an important problem in evolutionary biology and data analysis. Intuitively, the goodness-of-fit of a metric space to a tree depends on its inherent treeness, as well as other metric properties such as intrinsic dimension. Existing algorithms for embedding metric spaces into tree metrics provide distortion bounds depending on cardinality. Because cardinality is a simple property of any set, we argue that such bounds do not fully capture the rich structure endowed by the metric. We consider an embedding of a metric space into a tree proposed by Gromov.


Non-Aligned Distribution Distance using Metric Measure Embedding and Optimal Transport

arXiv.org Machine Learning

We propose a novel approach for comparing distributions whose supports do not necessarily lie on the same metric space. Unlike Gromov-Wasserstein (GW) distance that compares pairwise distance of elements from each distribution, we consider a method that embeds the metric measure spaces in a common Euclidean space and computes an optimal transport (OT) on the embedded distributions. This leads to what we call a sub-embedding robust Wasserstein(SERW). Under some conditions, SERW is a distance that considers an OT distance of the (low-distorted) embedded distributions using a common metric. In addition to this novel proposal that generalizes several recent OT works, our contributions stand on several theoretical analyses: i) we characterize the embedding spaces to define SERW distance for distribution alignment; ii) we prove that SERW mimics almost the same properties of GW distance, and we give a cost relation between GW and SERW. The paper also provides some numerical experiments illustrating how SERW behaves on matching problems in real-world.


Computationally Efficient Tree Variants of Gromov-Wasserstein

arXiv.org Machine Learning

We propose two novel variants of Gromov-Wasserstein (GW) between probability measures in different probability spaces based on projecting these measures into the tree metric spaces. Our first proposed discrepancy, named \emph{flow-based tree Gromov-Wasserstein}, hinges upon the tree metric from node to root in each tree to define the structure representation of probability measures on trees. The flow-based tree GW shares similar structures with univariate Wasserstein distance while keeping sufficient spatial information of the original projected probability measures. In order to further explore the structure of tree, we proposed another version of flow-based tree GW, which we refer to as \emph{depth-based tree Gromov-Wasserstein}. That discrepancy considers the alignment of probability measures hierarchically along each depth level of the tree structures. Finally, we demonstrate via extensive simulation studies on large-scale real data sets the relative advantage of the proposed discrepancies.