Goto

Collaborating Authors

 tree metric



Fitting trees to ℓ1-hyperbolic distances

Neural Information Processing Systems

Building trees to represent or to fit distances is a critical component of phylogenetic analysis, metric embeddings, approximation algorithms, geometric graph neural nets, and the analysis of hierarchical data. Much of the previous algorithmic work, however, has focused on generic metric spaces (i.e., those with no a priori constraints). Leveraging several ideas from the mathematical analysis of hyperbolic geometry and geometric group theory, we study the tree fitting problem as finding the relation between the hyperbolicity (ultrametricity) vector and the error of tree (ultrametric) embedding. That is, we define a vector of hyperbolicity (ultrametric) values over all triples of points and compare the ℓp norms of this vector with the ℓq norm of the distortion of the best tree fit to the distances. This formulation allows us to define the average hyperbolicity (ultrametricity) in terms of a normalized ℓ1 norm of the hyperbolicity vector. Furthermore, we can interpret the classical tree fitting result of Gromov as a p = q = result. We present an algorithm HCCROOTEDTREEFIT such that the ℓ1 error of the output embedding is analytically bounded in terms of the ℓ1 norm of the hyperbolicity vector (i.e., p = q = 1) and that this result is tight. Furthermore, this algorithm has significantly different theoretical and empirical performance as compared to Gromov's result and related algorithms.







. The TSW kernel is

Neural Information Processing Systems

Although Prop. 2 follows from Prop. 1, it follows the idea An upper bound on the Euclidean OT[...] The We will insist more on the importance of sampling tree metrics randomly, both for low-dimensional in 6.1 Definite-negativity is mentioned and highlighted[...] explain why is it important Is this to ensure that the kernel is positive-definite? This is why kernel methods kick in from .6 (or Gaussian processes as per Reviewer #2's suggestion). Indeed, averaging of negative definite functions is trivially negative definite. We used the farthest-point clustering due to its fast computation, i.e.