gromov
Improved Error Bounds for Tree Representations of Metric Spaces
Estimating optimal phylogenetic trees or hierarchical clustering trees from metric data is an important problem in evolutionary biology and data analysis. Intuitively, the goodness-of-fit of a metric space to a tree depends on its inherent treeness, as well as other metric properties such as intrinsic dimension. Existing algorithms for embedding metric spaces into tree metrics provide distortion bounds depending on cardinality. Because cardinality is a simple property of any set, we argue that such bounds do not fully capture the rich structure endowed by the metric. We consider an embedding of a metric space into a tree proposed by Gromov. By proving a stability result, we obtain an improved additive distortion bound depending only on the hyperbolicity and doubling dimension of the metric. We observe that Gromov's method is dual to the well-known single linkage hierarchical clustering (SLHC) method. By means of this duality, we are able to transport our results to the setting of SLHC, where such additive distortion bounds were previously unknown.
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Data Science > Data Mining (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Fitting trees to \ell_1 -hyperbolic distances
Building trees to represent or to fit distances is a critical component of phylogenetic analysis, metric embeddings, approximation algorithms, geometric graph neural nets, and the analysis of hierarchical data. Much of the previous algorithmic work, however, has focused on generic metric spaces (i.e., those with no \emph{a priori} constraints). Leveraging several ideas from the mathematical analysis of hyperbolic geometry and geometric group theory, we study the tree fitting problem as finding the relation between the hyperbolicity (ultrametricity) vector and the error of tree (ultrametric) embedding. That is, we define a vector of hyperbolicity (ultrametric) values over all triples of points and compare the $\ell_p$ norms of this vector with the $\ell_q$ norm of the distortion of the best tree fit to the distances. This formulation allows us to define the average hyperbolicity (ultrametricity) in terms of a normalized $\ell_1$ norm of the hyperbolicity vector. Furthermore, we can interpret the classical tree fitting result of Gromov as a $p = q = \infty$ result. We present an algorithm \textsc{HCCRootedTreeFit} such that the $\ell_1$ error of the output embedding is analytically bounded in terms of the $\ell_1$-norm of the hyperbolicity vector (i.e., $p = q = 1$) and that this result is tight. Furthermore, this algorithm has significantly different theoretical and empirical performance as compared to Gromov's result and related algorithms. Finally, we show using \textsc{HCCRootedTreeFit} and related tree fitting algorithms, that supposedly standard data sets for hierarchical data analysis and geometric graph neural networks have radically different tree fits than those of synthetic, truly tree-like data sets, suggesting that a much more refined analysis of these standard data sets is called for.
Improved Error Bounds for Tree Representations of Metric Spaces
Estimating optimal phylogenetic trees or hierarchical clustering trees from metric data is an important problem in evolutionary biology and data analysis. Intuitively, the goodness-of-fit of a metric space to a tree depends on its inherent treeness, as well as other metric properties such as intrinsic dimension. Existing algorithms for embedding metric spaces into tree metrics provide distortion bounds depending on cardinality. Because cardinality is a simple property of any set, we argue that such bounds do not fully capture the rich structure endowed by the metric. We consider an embedding of a metric space into a tree proposed by Gromov. By proving a stability result, we obtain an improved additive distortion bound depending only on the hyperbolicity and doubling dimension of the metric. We observe that Gromov's method is dual to the well-known single linkage hierarchical clustering (SLHC) method. By means of this duality, we are able to transport our results to the setting of SLHC, where such additive distortion bounds were previously unknown.
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Data Science > Data Mining (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Gromov-Wasserstein and optimal transport: from assignment problems to probabilistic numeric
Seyedi, Iman, Candelieri, Antonio, Messina, Enza, Archetti, Francesco
The assignment problem, a cornerstone of operations research, seeks an optimal one-to-one mapping between agents and tasks to minimize total cost. This work traces its evolution from classical formulations and algorithms to modern optimal transport (OT) theory, positioning the Quadratic Assignment Problem (QAP) and related structural matching tasks within this framework. We connect the linear assignment problem to Monge's transport problem, Kantorovich's relaxation, and Wasserstein distances, then extend to cases where source and target lie in different metric-measure spaces requiring Gromov-Wasserstein (GW) distances. GW formulations, including the fused GW variant that integrates structural and feature information, naturally address QAP-like problems by optimizing alignment based on both intra-domain distances and cross-domain attributes. Applications include graph matching, keypoint correspondence, and feature-based assignments. We present exact solvers, Genetic Algorithms (GA), and multiple GW variants, including a proposed multi-initialization strategy (GW-MultiInit) that mitigates the risk of getting stuck in local optima alongside entropic Sinkhorn-based approximations and fused GW. Computational experiments on capacitated QAP instances show that GW-MultiInit consistently achieves near-optimal solutions and scales efficiently to large problems where exact methods become impractical, while parameterized EGW and FGW variants provide flexible trade-offs between accuracy and runtime. Our findings provide theoretical foundations, computational insights, and practical guidelines for applying OT and GW methods to QAP and other real-world matching problems, such as those in machine learning and logistics.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Italy (0.04)
GENOT: Entropic (Gromov) Wasserstein Flow Matching with Applications to Single-Cell Genomics
Single-cell genomics has significantly advanced our understanding of cellular behavior, catalyzing innovations in treatments and precision medicine. However,single-cell sequencing technologies are inherently destructive and can only measure a limited array of data modalities simultaneously. Optimal transport (OT)has emerged as a potent solution, but traditional discrete solvers are hampered byscalability, privacy, and out-of-sample estimation issues. These challenges havespurred the development of neural network-based solvers, known as neural OTsolvers, that parameterize OT maps. Yet, these models often lack the flexibilityneeded for broader life science applications.
Improved Error Bounds for Tree Representations of Metric Spaces
Estimating optimal phylogenetic trees or hierarchical clustering trees from metric data is an important problem in evolutionary biology and data analysis. Intuitively, the goodness-of-fit of a metric space to a tree depends on its inherent treeness, as well as other metric properties such as intrinsic dimension. Existing algorithms for embedding metric spaces into tree metrics provide distortion bounds depending on cardinality. Because cardinality is a simple property of any set, we argue that such bounds do not fully capture the rich structure endowed by the metric. We consider an embedding of a metric space into a tree proposed by Gromov.
Fitting trees to \ell_1 -hyperbolic distances
Building trees to represent or to fit distances is a critical component of phylogenetic analysis, metric embeddings, approximation algorithms, geometric graph neural nets, and the analysis of hierarchical data. Much of the previous algorithmic work, however, has focused on generic metric spaces (i.e., those with no \emph{a priori} constraints). Leveraging several ideas from the mathematical analysis of hyperbolic geometry and geometric group theory, we study the tree fitting problem as finding the relation between the hyperbolicity (ultrametricity) vector and the error of tree (ultrametric) embedding. That is, we define a vector of hyperbolicity (ultrametric) values over all triples of points and compare the \ell_p norms of this vector with the \ell_q norm of the distortion of the best tree fit to the distances. This formulation allows us to define the average hyperbolicity (ultrametricity) in terms of a normalized \ell_1 norm of the hyperbolicity vector.