bouckaert
Principal Component Analysis as a Sanity Check for Bayesian Phylolinguistic Reconstruction
Bayesian approaches to reconstructing the evolutionary history of languages rely on the tree model, which assumes that these languages descended from a common ancestor and underwent modifications over time. However, this assumption can be violated to different extents due to contact and other factors. Understanding the degree to which this assumption is violated is crucial for validating the accuracy of phylolinguistic inference. In this paper, we propose a simple sanity check: projecting a reconstructed tree onto a space generated by principal component analysis. By using both synthetic and real data, we demonstrate that our method effectively visualizes anomalies, particularly in the form of jogging.
- Asia > China > Liaoning Province > Dalian (0.05)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Kagoshima Prefecture > Kagoshima (0.05)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
- (30 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
A Branch-and-Bound Algorithm for MDL Learning Bayesian Networks
This paper extends the work in [Suzuki, 1996] and presents an efficient depth-first branch-and-bound algorithm for learning Bayesian network structures, based on the minimum description length (MDL) principle, for a given (consistent) variable ordering. The algorithm exhaustively searches through all network structures and guarantees to find the network with the best MDL score. Preliminary experiments show that the algorithm is efficient, and that the time complexity grows slowly with the sample size. The algorithm is useful for empirically studying both the performance of suboptimal heuristic search algorithms and the adequacy of the MDL principle in learning Bayesian networks.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > Canada > British Columbia (0.04)
- North America > United States > Washington > King County > Redmond (0.04)
- (3 more...)