Bayesian Inference
The Three Faces of Bayes
Last summer, I was at a conference having lunch with Hal Daume III when we got to talking about how "Bayesian" can be a funny and ambiguous term. It seems like the definition should be straightforward: "following the work of English mathematician Rev. Thomas Bayes," perhaps, or even "uses Bayes' theorem." But many methods bearing the reverend's name or using his theorem aren't even considered "Bayesian" by his most religious followers. Why is it that Bayesian networks, for example, aren't considered… y'know… Bayesian? As I've read more outside the fields of machine learning and natural language processing -- from psychometrics and environmental biology to hackers who dabble in data science -- I've noticed three broad uses of the term "Bayesian."
Hawkes Processes with Stochastic Excitations
Lee, Young, Lim, Kar Wai, Ong, Cheng Soon
We propose an extension to Hawkes processes by treating the levels of self-excitation as a stochastic differential equation. Our new point process allows better approximation in application domains where events and intensities accelerate each other with correlated levels of contagion. We generalize a recent algorithm for simulating draws from Hawkes processes whose levels of excitation are stochastic processes, and propose a hybrid Markov chain Monte Carlo approach for model fitting. Our sampling procedure scales linearly with the number of required events and does not require stationarity of the point process. A modular inference procedure consisting of a combination between Gibbs and Metropolis Hastings steps is put forward. We recover expectation maximization as a special case. Our general approach is illustrated for contagion following geometric Brownian motion and exponential Langevin dynamics.
A probabilistic network for the diagnosis of acute cardiopulmonary diseases
Magrini, Alessandro, Luciani, Davide, Stefanini, Federico Mattia
We describe our experience in the development of a probabilistic network for the diagnosis of acute cardiopulmonary diseases. A panel of expert physicians collaborated to specify the qualitative part, that is a directed acyclic graph defining a factorization of the joint probability distribution of domain variables. The quantitative part, that is the set of all conditional probability distributions defined by each factor, was estimated in the Bayesian paradigm: we applied a special formal representation, characterized by a low number of parameters and a parameterization intelligible for physicians, elicited the joint prior distribution of parameters from medical experts, and updated it by conditioning on a dataset of hospital patient records using Markov Chain Monte Carlo simulation. Refinement was cyclically performed until the probabilistic network provided satisfactory Concordance Index values for a selection of acute diseases and reasonable inference on six fictitious patient cases. The probabilistic network can be employed to perform medical diagnosis on a total of 63 diseases (38 acute and 25 chronic) on the basis of up to 167 patient findings.
Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network
Bibliographic analysis considers the author's research areas, the citation network and the paper content among other things. In this paper, we combine these three in a topic model that produces a bibliographic model of authors, topics and documents, using a nonparametric extension of a combination of the Poisson mixed-topic link model and the author-topic model. This gives rise to the Citation Network Topic Model (CNTM). We propose a novel and efficient inference algorithm for the CNTM to explore subsets of research publications from CiteSeerX. The publication datasets are organised into three corpora, totalling to about 168k publications with about 62k authors. The queried datasets are made available online. In three publicly available corpora in addition to the queried datasets, our proposed model demonstrates an improved performance in both model fitting and document clustering, compared to several baselines. Moreover, our model allows extraction of additional useful knowledge from the corpora, such as the visualisation of the author-topics network. Additionally, we propose a simple method to incorporate supervision into topic modelling to achieve further improvement on the clustering task.
Nonparametric Bayesian Topic Modelling with the Hierarchical Pitman-Yor Processes
Lim, Kar Wai, Buntine, Wray, Chen, Changyou, Du, Lan
The Dirichlet process and its extension, the Pitman-Yor process, are stochastic processes that take probability distributions as a parameter. These processes can be stacked up to form a hierarchical nonparametric Bayesian model. In this article, we present efficient methods for the use of these processes in this hierarchical context, and apply them to latent variable models for text analytics. In particular, we propose a general framework for designing these Bayesian models, which are called topic models in the computer science community. We then propose a specific nonparametric Bayesian topic model for modelling text from social media. We focus on tweets (posts on Twitter) in this article due to their ease of access. We find that our nonparametric model performs better than existing parametric models in both goodness of fit and real world applications.
Clinical Tagging with Joint Probabilistic Models
Halpern, Yoni, Horng, Steven, Sontag, David
We describe a method for parameter estimation in bipartite probabilistic graphical models for joint prediction of clinical conditions from the electronic medical record. The method does not rely on the availability of gold-standard labels, but rather uses noisy labels, called anchors, for learning. We provide a likelihood-based objective and a moments-based initialization that are effective at learning the model parameters. The learned model is evaluated in a task of assigning a heldout clinical condition to patients based on retrospective analysis of the records, and outperforms baselines which do not account for the noisiness in the labels or do not model the conditions jointly.
Learning Continuous Time Bayesian Networks in Non-stationary Domains
Non-stationary continuous time Bayesian networks are introduced. They allow the parents set of each node to change over continuous time. Three settings are developed for learning non-stationary continuous time Bayesian networks from data: known transition times, known number of epochs and unknown number of epochs. A score function for each setting is derived and the corresponding learning algorithm is developed. A set of numerical experiments on synthetic data is used to compare the effectiveness of non-stationary continuous time Bayesian networks to that of non-stationary dynamic Bayesian networks. Furthermore, the performance achieved by non-stationary continuous time Bayesian networks is compared to that achieved by state-of-the-art algorithms on four real-world datasets, namely drosophila, saccharomyces cerevisiae, songbird and macroeconomics.
Bayesian Variable Selection for Globally Sparse Probabilistic PCA
Bouveyron, Charles, Latouche, Pierre, Mattei, Pierre-Alexandre
Sparse versions of principal component analysis (PCA) have imposed themselves as simple, yet powerful ways of selecting relevant features of high-dimensional data in an unsupervised manner. However, when several sparse principal components are computed, the interpretation of the selected variables is difficult since each axis has its own sparsity pattern and has to be interpreted separately. To overcome this drawback, we propose a Bayesian procedure called globally sparse probabilistic PCA (GSPPCA) that allows to obtain several sparse components with the same sparsity pattern. This allows the practitioner to identify the original variables which are relevant to describe the data. To this end, using Roweis' probabilistic interpretation of PCA and a Gaussian prior on the loading matrix, we provide the first exact computation of the marginal likelihood of a Bayesian PCA model. To avoid the drawbacks of discrete model selection, a simple relaxation of this framework is presented. It allows to find a path of models using a variational expectation-maximization algorithm. The exact marginal likelihood is then maximized over this path. This approach is illustrated on real and synthetic data sets. In particular, using unlabeled microarray data, GSPPCA infers much more relevant gene subsets than traditional sparse PCA algorithms.
Bayesian Regularization for #NeuralNetworks – Autonomous Agents -- #AI
Bayes's Theorem fundamentally is based on the concept of "validity of Beliefs". Reverend Thomas Bayes was a Presbyterian minster and a Mathematician who pondered much about developing the proof of existence of God. He came up with the Theorem in 18th century (which was later refined by Pierre-Simmon Laplace) to fix or establish the validity of'existing' or'previous' Beliefs in the face of best available'new' evidence. Think of it as a equation to correct prior beliefs based on new evidence. One of the popular example used to explain Bayes's Theorem is to detect if a patient has a certain disease or not.
Machine Learning in Robotics – 5 Modern Applications
As the term "machine learning" has heated up, interest in "robotics" (as expressed in Google Trends) has not altered much over the last three years. So how much of a place is there for machine learning in robotics? While only a portion of recent developments in robotics can be credited to developments and uses of machine learning, I've aimed to collect some of the more prominent applications together in this article, along with links and references. Before I delve into machine learning in robotics, go ahead and define "robot". Though at first this might seem simple, it's no easy task to come to an agreement on just what a robot is and what it is not, even amongst roboticists.