MREC: a fast and versatile framework for aligning and matching point clouds with applications to single cell molecular data

arXiv.org Machine Learning

Comparing and aligning large datasets is a pervasive problem occurring across many different knowledge domains. We introduce and study MREC, a recursive decomposition algorithm for computing matchings between data sets. The basic idea is to partition the data, match the partitions, and then recursively match the points within each pair of identified partitions. The matching itself is done using black box matching procedures that are too expensive to run on the entire data set. Using an absolute measure of the quality of a matching, the framework supports optimization over parameters including partitioning procedures and matching algorithms. By design, MREC can be applied to extremely large data sets. We analyze the procedure to describe when we can expect it to work well and demonstrate its flexibility and power by applying it to a number of alignment problems arising in the analysis of single cell molecular data.


Recovering metric from full ordinal information

arXiv.org Machine Learning

Given a geodesic space (E, d), we show that full ordinal knowledge on the metric d-i.e. knowledge of the function D d : (w, x, y, z) $\rightarrow$ 1 d(w,x)$\le$d(y,z) , determines uniquely-up to a constant factor-the metric d. For a subspace En of n points of E, converging in Hausdorff distance to E, we construct a metric dn on En, based only on the knowledge of D d on En and establish a sharp upper bound of the Gromov-Hausdorff distance between (En, dn) and (E, d).


Representative Datasets: The Perceptron Case

arXiv.org Machine Learning

One of the main drawbacks of the practical use of neural networks is the long time needed in the training process. Such training process consists in an iterative change of parameters trying to minimize a loss function. These changes are driven by a dataset, which can be seen as a set of labeled points in an n-dimensional space. In this paper, we explore the concept of it representative dataset which is smaller than the original dataset and satisfies a nearness condition independent of isometric transformations. The representativeness is measured using persistence diagrams due to its computational efficiency. We also prove that the accuracy of the learning process of a neural network on a representative dataset is comparable with the accuracy on the original dataset when the neural network architecture is a perceptron and the loss function is the mean squared error. These theoretical results accompanied with experimentation open a door to the size reduction of the dataset in order to gain time in the training process of any neural network.


Gromov-Wasserstein Learning for Graph Matching and Node Embedding

arXiv.org Machine Learning

A novel Gromov-Wasserstein learning framework is proposed to jointly match (align) graphs and learn embedding vectors for the associated graph nodes. Using Gromov-Wasserstein discrepancy, we measure the dissimilarity between two graphs and find their correspondence, according to the learned optimal transport. The node embeddings associated with the two graphs are learned under the guidance of the optimal transport, the distance of which not only reflects the topological structure of each graph but also yields the correspondence across the graphs. These two learning steps are mutually-beneficial, and are unified here by minimizing the Gromov-Wasserstein discrepancy with structural regularizers. This framework leads to an optimization problem that is solved by a proximal point method. We apply the proposed method to matching problems in real-world networks, and demonstrate its superior performance compared to alternative approaches.


Computationally Efficient Tree Variants of Gromov-Wasserstein

arXiv.org Machine Learning

We propose two novel variants of Gromov-Wasserstein (GW) between probability measures in different probability spaces based on projecting these measures into the tree metric spaces. Our first proposed discrepancy, named \emph{flow-based tree Gromov-Wasserstein}, hinges upon the tree metric from node to root in each tree to define the structure representation of probability measures on trees. The flow-based tree GW shares similar structures with univariate Wasserstein distance while keeping sufficient spatial information of the original projected probability measures. In order to further explore the structure of tree, we proposed another version of flow-based tree GW, which we refer to as \emph{depth-based tree Gromov-Wasserstein}. That discrepancy considers the alignment of probability measures hierarchically along each depth level of the tree structures. Finally, we demonstrate via extensive simulation studies on large-scale real data sets the relative advantage of the proposed discrepancies.