AITopics | tree space

Phylogenetics is a classical methodology in computational biology that today has become highly relevant for medical investigation of single-cell data, e.g., in the context of development of cancer. The exponential size of the tree space is unfortunately a formidable obstacle for current Bayesian phylogenetic inference using Markov chain Monte Carlo based methods since these rely on local operations. And although more recent variational inference (VI) based methods offer speed improvements, they rely on expensive auto-differentiation operations for learning the variational parameters. We propose VaiPhy, a remarkably fast VI based algorithm for approximate posterior inference in an \textit{augmented tree space}. VaiPhy produces marginal log-likelihood estimates on par with the state-of-the-art methods on real data, and is considerably faster since it does not require auto-differentiation. Instead, VaiPhy combines coordinate ascent update equations with two novel sampling schemes: (i) \textit{SLANTIS}, a proposal distribution for tree topologies in the augmented tree space, and (ii) the \textit{JC sampler}, the, to the best of our knowledge, first ever scheme for sampling branch lengths directly from the popular Jukes-Cantor model. We compare VaiPhy in terms of density estimation and runtime. Additionally, we evaluate the reproducibility of the baselines.

name change, vaiphy, variational inference, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

Generalizing Tree Probability Estimation via Bayesian Networks

Cheng Zhang, Frederick A Matsen IV

Neural Information Processing SystemsNov-20-2025, 19:21:55 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, bayesian inference, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry:

Health & Medicine > Therapeutic Area (0.95)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

093f65e080a295f8076b1c5722a46aa2-Paper.pdf

Neural Information Processing SystemsOct-1-2025, 23:57:00 GMT

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > New York > New York County > New York City (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.69)

Add feedback

5e956fef0946dc1e39760f94b78045fe-Paper-Conference.pdf

Neural Information Processing SystemsAug-15-2025, 05:13:07 GMT

inference, phylogenetic inference, vaiphy, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Biomedical Informatics (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

VaiPhy: a Variational Inference Based Algorithm for Phylogeny

Neural Information Processing SystemsOct-11-2024, 06:47:35 GMT

Phylogenetics is a classical methodology in computational biology that today has become highly relevant for medical investigation of single-cell data, e.g., in the context of development of cancer. The exponential size of the tree space is unfortunately a formidable obstacle for current Bayesian phylogenetic inference using Markov chain Monte Carlo based methods since these rely on local operations. And although more recent variational inference (VI) based methods offer speed improvements, they rely on expensive auto-differentiation operations for learning the variational parameters. We propose VaiPhy, a remarkably fast VI based algorithm for approximate posterior inference in an \textit{augmented tree space}. VaiPhy produces marginal log-likelihood estimates on par with the state-of-the-art methods on real data, and is considerably faster since it does not require auto-differentiation. Instead, VaiPhy combines coordinate ascent update equations with two novel sampling schemes: (i) \textit{SLANTIS}, a proposal distribution for tree topologies in the augmented tree space, and (ii) the \textit{JC sampler}, the, to the best of our knowledge, first ever scheme for sampling branch lengths directly from the popular Jukes-Cantor model.

tree space, vaiphy, variational inference, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

Sampling the Swadesh List to Identify Similar Languages with Tree Spaces

Ordway, Garett, Patrangenaru, Vic

arXiv.org Artificial IntelligenceMay-10-2024

Communication plays a vital role in human interaction. Studying language is a worthwhile task and more recently has become quantitative in nature with developments of fields like quantitative comparative linguistics and lexicostatistics. With respect to the authors own native languages, the ancestry of the English language and the Latin alphabet are of the primary interest. The Indo-European Tree traces many modern languages back to the Proto-Indo-European root. Swadesh's cognates played a large role in developing that historical perspective where some of the primary branches are Germanic, Celtic, Italic, and Balto-Slavic. This paper will use data analysis on open books where the simplest singular space is the 3-spider - a union T3 of three rays with their endpoints glued at a point 0 - which can represent these tree spaces for language clustering. These trees are built using a single linkage method for clustering based on distances between samples from languages which use the Latin Script. Taking three languages at a time, the barycenter is determined. Some initial results have found both non-sticky and sticky sample means. If the mean exhibits non-sticky properties, then one language may come from a different ancestor than the other two. If the mean is considered sticky, then the languages may share a common ancestor or all languages may have different ancestry.

open book, phylogenetic tree, swadesh, (15 more...)

arXiv.org Artificial Intelligence

2405.06549

Country:

North America > United States > Florida > Hillsborough County > University (0.04)
Europe > Middle East (0.04)
Asia > Middle East (0.04)
(2 more...)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Health & Medicine > Therapeutic Area (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.68)

Add feedback

Leaping through tree space: continuous phylogenetic inference for rooted and unrooted trees

Penn, Matthew J, Scheidwasser, Neil, Penn, Joseph, Donnelly, Christl A, Duchêne, David A, Bhatt, Samir

arXiv.org Artificial IntelligenceJan-23-2024

Phylogenetics is now fundamental in life sciences, providing insights into the earliest branches of life and the origins and spread of epidemics. However, finding suitable phylogenies from the vast space of possible trees remains challenging. To address this problem, for the first time, we perform both tree exploration and inference in a continuous space where the computation of gradients is possible. This continuous relaxation allows for major leaps across tree space in both rooted and unrooted trees, and is less susceptible to convergence to local minima. Our approach outperforms the current best methods for inference on unrooted trees and, in simulation, accurately infers the tree and root in ultrametric cases. The approach is effective in cases of empirical data with negligible amounts of data, which we demonstrate on the phylogeny of jawed vertebrates. Indeed, only a few genes with an ultrametric signal were generally sufficient for resolving the major lineages of vertebrates. Optimisation is possible via automatic differentiation and our method presents an effective way forwards for exploring the most difficult, data-deficient phylogenetic questions.

algorithm, inference, node, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1093/gbe/evad213

2306.05739

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)

Add feedback

Phylo2Vec: a vector representation for binary trees

Penn, Matthew J, Scheidwasser, Neil, Khurana, Mark P, Duchêne, David A, Donnelly, Christl A, Bhatt, Samir

arXiv.org Artificial IntelligenceDec-1-2023

Binary phylogenetic trees inferred from biological data are central to understanding the shared evolutionary history of organisms. Inferring the placement of latent nodes in a tree by any optimality criterion (e.g., maximum likelihood) is an NP-hard problem, propelling the development of myriad heuristic approaches. Yet, these heuristics often lack a systematic means of uniformly sampling random trees or effectively exploring a tree space that grows factorially, which are crucial to optimisation problems such as machine learning. Phylo2Vec maps any binary tree with n leaves to an integer vector of length n 1. We prove that Phylo2Vec is both well-defined and bijective to the space of phylogenetic trees. The advantages of Phylo2Vec are twofold: i) easy uniform sampling of binary trees and ii) systematic ability to traverse tree space in very large or small jumps. As a proof of concept, we use Phylo2Vec for maximum likelihood inference on five real-world datasets and show that a simple hill climbing-based optimisation can efficiently traverse the vastness of tree space from a random to an optimal tree. Phylogenetic trees are a fundamental tool in depicting evolutionary processes, whether linguistic (evolution of different languages and language families) or biological (evolution of biological entities). In the latter field, phylogenetic trees are integral to multiple research domains, including evolution (Morlon et al., 2010), conservation (Rolland et al., 2011), and epidemiology, where they allow us to better understand infectious disease transmission dynamics (Ypma et al., 2013; Faria et al., 2021). A multitude of computer-readable formats have been proposed to store and represent (binary) phylogenetic trees. While basic data structures such as arrays or linked lists can be used for this purpose, the Newick format, as outlined by Olsen (1990) and Felsenstein (2004), has emerged as the standard notation. Each parenthesis encloses a pair of leaf nodes or subtrees, separated by a comma.

node, phylo2vec, representation, (17 more...)

arXiv.org Artificial Intelligence

2304.12693

Country: