Goto

Collaborating Authors

 Bayesian Learning


Advances in Bayesian methods for big data

#artificialintelligence

In the Big Data era, many scientific and engineering domains are producing massive data streams, with petabyte and exabyte scales becoming increasingly common. Besides the explosive growth in volume, Big Data also has high velocity, high variety, and high uncertainty. These complex data streams require ever-increasing processing speeds, economical storage, and timely response for decision making in highly uncertain environments, and have raised various challenges to conventional data analysis. With the primary goal of building intelligent systems that automatically improve from experiences, machine learning (ML) is becoming an increasingly important field to tackle big data challenges, with an emerging field of "Big Learning," which covers theories, algorithms and systems on addressing big data problems. Bayesian methods have been widely used in machine learning and many other areas.


40 Python Statistics For Data Science Resources

#artificialintelligence

For an introduction to statistics, this tutorial with real-life examples is the way to go. The notebooks of this tutorial will introduce you to concepts like mean, median, standard deviation, and the basics of topics such as hypothesis testing and probability distributions. A fine way to start your stats learning, since it is inspired by the books "Think Bayes" and "Think Stats", which are two top recommendations that will come back below! If you're looking for books, you can try out this free book on computational statistics in Python, which not only contains an introduction to programming with Python, but also treats topics such as Markov Chain Monte Carlo, the Expectation-Maximization (EM) algorithm, resampling methods, and much more. Or you can buy this book by Thomas Haslwanter for a general introduction to common statistical tests, linear regression analysis and topics from survival analysis and Bayesian statistics. Note that this book does take life and medical sciences as an application area. Both of the above books already introduce you to more advanced statistics topics with Python too, as you can see. If you're a fan of videos, you should consider watching this tutorial on statistical data analysis with SciPy with Christopher Fonnesbeck, an Assistant Professor in the Department of Biostatistics at the Vanderbilt University School of Medicine.



Multiple Kernel Learning and Automatic Subspace Relevance Determination for High-dimensional Neuroimaging Data

arXiv.org Machine Learning

Alzheimer's disease is a major cause of dementia. Its diagnosis requires accurate biomarkers that are sensitive to disease stages. In this respect, we regard probabilistic classification as a method of designing a probabilistic biomarker for disease staging. Probabilistic biomarkers naturally support the interpretation of decisions and evaluation of uncertainty associated with them. In this paper, we obtain probabilistic biomarkers via Gaussian Processes. Gaussian Processes enable probabilistic kernel machines that offer flexible means to accomplish Multiple Kernel Learning. Exploiting this flexibility, we propose a new variation of Automatic Relevance Determination and tackle the challenges of high dimensionality through multiple kernels. Our research results demonstrate that the Gaussian Process models are competitive with or better than the well-known Support Vector Machine in terms of classification performance even in the cases of single kernel learning. Extending the basic scheme towards the Multiple Kernel Learning, we improve the efficacy of the Gaussian Process models and their interpretability in terms of the known anatomical correlates of the disease. For instance, the disease pathology starts in and around the hippocampus and entorhinal cortex. Through the use of Gaussian Processes and Multiple Kernel Learning, we have automatically and efficiently determined those portions of neuroimaging data. In addition to their interpretability, our Gaussian Process models are competitive with recent deep learning solutions under similar settings.


A New Measure of Conditional Dependence

arXiv.org Machine Learning

Measuring conditional dependencies among the variables of a network is of great interest to many disciplines. This paper studies some shortcomings of the existing dependency measures in detecting direct causal influences or their lack of ability for group selection to capture strong dependencies and accordingly introduces a new statistical dependency measure to overcome them. This measure is inspired by Dobrushin's coefficients and based on the fact that there is no dependency between $X$ and $Y$ given another variable $Z$, if and only if the conditional distribution of $Y$ given $X=x$ and $Z=z$ does not change when $X$ takes another realization $x'$ while $Z$ takes the same realization $z$. We show the advantages of this measure over the related measures in the literature. Moreover, we establish the connection between our measure and the integral probability metric (IPM) that helps to develop estimators of the measure with lower complexity compared to other relevant information theoretic based measures. Finally, we show the performance of this measure through numerical simulations.


Streaming Bayesian inference: theoretical limits and mini-batch approximate message-passing

arXiv.org Machine Learning

In statistical learning for real-world large-scale data problems, one must often resort to "streaming" algorithms which operate sequentially on small batches of data. In this work, we present an analysis of the information-theoretic limits of mini-batch inference in the context of generalized linear models and low-rank matrix factorization. In a controlled Bayes-optimal setting, we characterize the optimal performance and phase transitions as a function of mini-batch size. We base part of our results on a detailed analysis of a mini-batch version of the approximate message-passing algorithm (Mini-AMP), which we introduce. Additionally, we show that this theoretical optimality carries over into real-data problems by illustrating that Mini-AMP is competitive with standard streaming algorithms for clustering.


An Efficient Algorithm for Bayesian Nearest Neighbours

arXiv.org Machine Learning

K-Nearest Neighbours (k-NN) is a popular classification and regression algorithm, yet one of its main limitations is the difficulty in choosing the number of neighbours. We present a Bayesian algorithm to compute the posterior probability distribution for k given a target point within a data-set, efficiently and without the use of Markov Chain Monte Carlo (MCMC) methods or simulation - alongside an exact solution for distributions within the exponential family. The central idea is that data points around our target are generated by the same probability distribution, extending outwards over the appropriate, though unknown, number of neighbours. Once the data is projected onto a distance metric of choice, we can transform the choice of k into a change-point detection problem, for which there is an efficient solution: we recursively compute the probability of the last change-point as we move towards our target, and thus de facto compute the posterior probability distribution over k. Applying this approach to both a classification and a regression UCI data-sets, we compare favourably and, most importantly, by removing the need for simulation, we are able to compute the posterior probability of k exactly and rapidly. As an example, the computational time for the Ripley data-set is a few milliseconds compared to a few hours when using a MCMC approach.


Learning Structures of Bayesian Networks for Variable Groups

arXiv.org Artificial Intelligence

Bayesian networks, and especially their structures, are powerful tools for representing conditional independencies and dependencies between random variables. In applications where related variables form a priori known groups, chosen to represent different "views" to or aspects of the same entities, one may be more interested in modeling dependencies between groups of variables rather than between individual variables. Motivated by this, we study prospects of representing relationships between variable groups using Bayesian network structures. We show that for dependency structures between groups to be expressible exactly, the data have to satisfy the so-called groupwise faithfulness assumption. We also show that one cannot learn causal relations between groups using only groupwise conditional independencies, but also variable-wise relations are needed. Additionally, we present algorithms for finding the groupwise dependency structures.


Bayesian $l_0$ Regularized Least Squares

arXiv.org Machine Learning

Bayesian $l_0$-regularized least squares provides a variable selection technique for high dimensional predictors. The challenge in $l_0$ regularization is optimizing a non-convex objective function via search over model space consisting of all possible predictor combinations, a NP-hard task. Spike-and-slab (a.k.a. Bernoulli-Gaussian, BG) priors are the gold standard for Bayesian variable selection, with a caveat of computational speed and scalability. We show that a Single Best Replacement (SBR) algorithm is a fast scalable alternative. Although SBR calculates a sparse posterior mode, we show that it possesses a number of equivalences and optimality properties of a posterior mean. To illustrate our methodology, we provide simulation evidence and a real data example on the statistical properties and computational efficiency of SBR versus direct posterior sampling using spike-and-slab priors. Finally, we conclude with directions for future research.


Forward-Backward Selection with Early Dropping

arXiv.org Machine Learning

Forward-backward selection is one of the most basic and commonly-used feature selection algorithms available. It is also general and conceptually applicable to many different types of data. In this paper, we propose a heuristic that significantly improves its running time, while preserving predictive accuracy. The idea is to temporarily discard the variables that are conditionally independent with the outcome given the selected variable set. Depending on how those variables are reconsidered and reintroduced, this heuristic gives rise to a family of algorithms with increasingly stronger theoretical guarantees. In distributions that can be faithfully represented by Bayesian networks or maximal ancestral graphs, members of this algorithmic family are able to correctly identify the Markov blanket in the sample limit. In experiments we show that the proposed heuristic increases computational efficiency by about two orders of magnitude in high-dimensional problems, while selecting fewer variables and retaining predictive performance. Furthermore, we show that the proposed algorithm and feature selection with LASSO perform similarly when restricted to select the same number of variables, making the proposed algorithm an attractive alternative for problems where no (efficient) algorithm for LASSO exists.