Bayesian Inference

Mathematics Behind AI & Machine Learning


Let's face reality, mathematics is far from being enjoyable. To learn it, we often lack time, and most importantly, motivation. Why do we need all these symbols and a bunch of figures? It turns out, a lot of sense. Especially if you have something to do with machine learning.

Optimal models of sound localization by barn owls

Neural Information Processing Systems

Sound localization by barn owls is commonly modeled as a matching procedure where localization cues derived from auditory inputs are compared to stored templates. While the matching models can explain properties of neural responses, no model explains how the owl resolves spatial ambiguity in the localization cues to produce accurate localization near the center of gaze. Here, we examine two models for the barn owl's sound localization behavior. First, we consider a maximum likelihood estimator in order to further evaluate the cue matching model. Second, we consider a maximum a posteriori estimator to test if a Bayesian model with a prior that emphasizes directions near the center of gaze can reproduce the owl's localization behavior.

MAP Estimation for Graphical Models by Likelihood Maximization

Neural Information Processing Systems

Computing a {\em maximum a posteriori} (MAP) assignment in graphical models is a crucial inference problem for many practical applications. Several provably convergent approaches have been successfully developed using linear programming (LP) relaxation of the MAP problem. We present an alternative approach, which transforms the MAP problem into that of inference in a finite mixture of simple Bayes nets. We then derive the Expectation Maximization (EM) algorithm for this mixture that also monotonically increases a lower bound on the MAP assignment until convergence. The update equations for the EM algorithm are remarkably simple, both conceptually and computationally, and can be implemented using a graph-based message passing paradigm similar to max-product computation.

A Bayesian model for identifying hierarchically organised states in neural population activity

Neural Information Processing Systems

Neural population activity in cortical circuits is not solely driven by external inputs, but is also modulated by endogenous states which vary on multiple time-scales. To understand information processing in cortical circuits, we need to understand the statistical structure of internal states and their interaction with sensory inputs. Here, we present a statistical model for extracting hierarchically organised neural population states from multi-channel recordings of neural spiking activity. Population states are modelled using a hidden Markov decision tree with state-dependent tuning parameters and a generalised linear observation model. We present a variational Bayesian inference algorithm for estimating the posterior distribution over parameters from neural population recordings.

A Bayesian method for reducing bias in neural representational similarity analysis

Neural Information Processing Systems

In neuroscience, the similarity matrix of neural activity patterns in response to different sensory stimuli or under different cognitive states reflects the structure of neural representational space. Existing methods derive point estimations of neural activity patterns from noisy neural imaging data, and the similarity is calculated from these point estimations. We show that this approach translates structured noise from estimated patterns into spurious bias structure in the resulting similarity matrix, which is especially severe when signal-to-noise ratio is low and experimental conditions cannot be fully randomized in a cognitive task. We propose an alternative Bayesian framework for computing representational similarity in which we treat the covariance structure of neural activity patterns as a hyper-parameter in a generative model of the neural data, and directly estimate this covariance structure from imaging data while marginalizing over the unknown activity patterns. Converting the estimated covariance structure into a correlation matrix offers a much less biased estimate of neural representational similarity.

Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables

Neural Information Processing Systems

We present a method for learning treewidth-bounded Bayesian networks from data sets containing thousands of variables. Bounding the treewidth of a Bayesian network greatly reduces the complexity of inferences. Yet, being a global property of the graph, it considerably increases the difficulty of the learning process. Our novel algorithm accomplishes this task, scaling both to large domains and to large treewidths. Our novel approach consistently outperforms the state of the art on experiments with up to thousands of variables.

A Bayesian Model of Conditioned Perception

Neural Information Processing Systems

We propose an extended probabilistic model for human perception. We argue that in many circumstances, human observers simultaneously evaluate sensory evidence under different hypotheses regarding the underlying physical process that might have generated the sensory information. Within this context, inference can be optimal if the observer weighs each hypothesis according to the correct belief in that hypothesis. But if the observer commits to a particular hypothesis, the belief in that hypothesis is converted into subjective certainty, and subsequent perceptual behavior is suboptimal, conditioned only on the chosen hypothesis. We demonstrate that this framework can explain psychophysical data of a recently reported decision-estimation experiment.

Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints

Neural Information Processing Systems

In this paper, we derive a method to refine a Bayes network diagnostic model by exploiting constraints implied by expert decisions on test ordering. At each step, the expert executes an evidence gathering test, which suggests the test's relative diagnostic value. We demonstrate that consistency with an expert's test selection leads to non-convex constraints on the model parameters. We incorporate these constraints by augmenting the network with nodes that represent the constraint likelihoods. Gibbs sampling, stochastic hill climbing and greedy search algorithms are proposed to find a MAP estimate that takes into account test ordering constraints and any data available.

Stochastic Relational Models for Large-scale Dyadic Data using MCMC

Neural Information Processing Systems

Stochastic relational models provide a rich family of choices for learning and predicting dyadic data between two sets of entities. Previously empirical Bayesian inference was applied, which is however not scalable when the size of either object sets becomes tens of thousands. In this paper, we introduce a Markov chain Monte Carlo (MCMC) algorithm to scale the model to very large-scale dyadic data. Both superior scalability and predictive accuracy are demonstrated on a collaborative filtering problem, which involves tens of thousands users and a half million items. Papers published at the Neural Information Processing Systems Conference.