Bayesian Inference
Bayesian Hypernetworks
Krueger, David, Huang, Chin-Wei, Islam, Riashat, Turner, Ryan, Lacoste, Alexandre, Courville, Aaron
We propose Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork, $h$, is a neural network which learns to transform a simple noise distribution, $p(\epsilon) = \mathcal{N}(0,I)$, to a distribution $q(\theta) \doteq q(h(\epsilon))$ over the parameters $\theta$ of another neural network (the "primary network"). We train $q$ with variational inference, using an invertible $h$ to enable efficient estimation of the variational lower bound on the posterior $p(\theta | \mathcal{D})$ via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap i.i.d. sampling of $q(\theta)$. We demonstrate these qualitative advantages of Bayesian hypernets, which also achieve competitive performance on a suite of tasks that demonstrate the advantage of estimating model uncertainty, including active learning and anomaly detection.
Bayesian Estimation of Signal Detection Models, Part 1
We begin by calculating the maximum likelihood estimates of the EVSDT parameters, separately for each participant in the data set. Before doing so, I note that this data processing is only required for manual calculation of the point estimates; the modeling methods described below take the raw data and therefore don't require this annoying step. First, we'll compute for each trial whether the participant's response was a hit, false alarm, correct rejection, or a miss. We'll do this by creating a new variable, type: Then we can simply count the numbers of these four types of trials for each participant, and put the counts on one row per participant. For a single subject, d' can be calculated as the difference of the standardized hit and false alarm rates (Stanislaw and Todorov 1999): Its inverse, \(\Phi {-1}\), converts a proportion (such as a hit rate or false alarm rate) into a z score.
Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer
Isele, David, Rostami, Mohammad, Eaton, Eric
Knowledge transfer between tasks can improve the performance of learned models, but requires an accurate estimate of the inter-task relationships to identify the relevant knowledge to transfer. These inter-task relationships are typically estimated based on training data for each task, which is inefficient in lifelong learning settings where the goal is to learn each consecutive task rapidly from as little data as possible. To reduce this burden, we develop a lifelong learning method based on coupled dictionary learning that utilizes high-level task descriptions to model the inter-task relationships. We show that using task descriptors improves the performance of the learned task policies, providing both theoretical justification for the benefit and empirical demonstration of the improvement across a variety of learning problems. Given only the descriptor for a new task, the lifelong learner is also able to accurately predict a model for the new task through zero-shot learning using the coupled dictionary, eliminating the need to gather training data before addressing the task.
Gaussian Processes for Data-Efficient Learning in Robotics and Control
Deisenroth, Marc Peter, Fox, Dieter, Rasmussen, Carl Edward
Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this article, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.
How to sample from multidimensional distributions using Gibbs sampling?
We will show how to perform multivariate random sampling using one of the Markov Chain Monte Carlo (MCMC) algorithms, called the Gibbs sampler. To start, what are MCMC algorithms and what are they based on? Suppose we are interested in generating a random variable with a distribution of, over . If we are not able to do this directly, we will be satisfied with generating a sequence of random variables, which in a sense tending to a distribution of . Build a Markov chain, for, whose stationary distribution is .
?_lrsc=f881dca6-a775-43b0-8129-3f793883d38a&utm_source=tw-elevate&utm_medium=social
Machine learning (the subfield of computer science that, according to Arthur Samuel, "gives computers the ability to learn without being explicitly programmed") is one of the most innovative and interesting fields of modern science around today. The Bayes' theorem, which describes the probability of an event, based on prior knowledge of conditions that might be related to the event, was pretty much left alone until the 1950's when famed scientist Alan Turing managed to create and develop his imaginatively named'Alan Turing's Learning Machine'. This was a huge breakthrough for the field and along with the acceleration of computer development, the next few decades saw a gigantic rise in development of machine learning techniques such as artificial neural networks, and explanation based learning. Yes, the explanation based learning algorithm was fairly standard in that it created new business rules based on what had happened before.
Unifying Local and Global Change Detection in Dynamic Networks
Li, Wenzhe, Guo, Dong, Steeg, Greg Ver, Galstyan, Aram
Many real-world networks are complex dynamical systems, where both local (e.g., changing node attributes) and global (e.g., changing network topology) processes unfold over time. Local dynamics may provoke global changes in the network, and the ability to detect such effects could have profound implications for a number of real-world problems. Most existing techniques focus individually on either local or global aspects of the problem or treat the two in isolation from each other. In this paper we propose a novel network model that simultaneously accounts for both local and global dynamics. To the best of our knowledge, this is the first attempt at modeling and detecting local and global change points on dynamic networks via a unified generative framework. Our model is built upon the popular mixed membership stochastic blockmodels (MMSB) with sparse co-evolving patterns. We derive an efficient stochastic gradient Langevin dynamics (SGLD) sampler for our proposed model, which allows it to scale to potentially very large networks. Finally, we validate our model on both synthetic and real-world data and demonstrate its superiority over several baselines.
A Tutorial on Hawkes Processes for Events in Social Media
Rizoiu, Marian-Andrei, Lee, Young, Mishra, Swapnil, Xie, Lexing
This chapter provides an accessible introduction for point processes, and especially Hawkes processes, for modeling discrete, inter-dependent events over continuous time. We start by reviewing the definitions and the key concepts in point processes. We then introduce the Hawkes process, its event intensity function, as well as schemes for event simulation and parameter estimation. We also describe a practical example drawn from social media data - we show how to model retweet cascades using a Hawkes self-exciting process. We presents a design of the memory kernel, and results on estimating parameters and predicting popularity. The code and sample event data are available as an online appendix
Bayesian inference. : Probabilistic machine learning and artificial intelligence : Nature : Nature Research
A simple example of Bayesian inference applied to a medical diagnosis problem. Here the problem is diagnosing a rare disease using information from the patient's symptoms and, potentially, the patient's genetic marker measurements, which indicate predisposition (gen pred) to this disease. In this example, all variables are assumed to be binary. The relationships between variables are indicated by directed arrows and the probability of each variable given other variables they directly depend on is also shown. Yellow nodes denote measurable variables, whereas green nodes denote hidden variables.
A GAMP Based Low Complexity Sparse Bayesian Learning Algorithm
Al-Shoukairi, Maher, Schniter, Philip, Rao, Bhaskar D.
Abstract--In this paper, we present an algorithm for the sparse signal recovery problem that incorporates damped Gaussian generalized approximate message passing (GGAMP) into Expectation-Maximization (EM)-based sparse Bayesian learning (SBL). In particular, GGAMP is used to implement the E-step in SBL in place of matrix inversion, leveraging the fact that GGAMP is guaranteed to converge with appropriate damping. The resulting GGAMP-SBL algorithm is much more robust to arbitrary measurement matrix A than the standard damped GAMP algorithm while being much lower complexity than the standard SBL algorithm. We then extend the approach from the single measurement vector (SMV) case to the temporally correlated multiple measurement vector (MMV) case, leading to the GGAMP-TSBL algorithm. We verify the robustness and computational advantages of the proposed algorithms through numerical experiments. The problem of sparse signal recovery (SSR) and the related problem of compressed sensing have received much attention in recent years [1]-[6]. Despite the difficulty in solving this problem [7], an important finding in recent years is that for a sufficiently sparse x and a well designed A, accurate recovery is possible by techniques such as basis pursuit and orthogonal matching pursuit [8]- [10]. The SSR problem has seen considerable advances on the algorithmic front and they include iteratively reweighted algorithms [11]-[13] and Bayesian techniques [14]-[20], among others. Two Bayesian techniques related to this work are the generalized approximate message passing (GAMP) and the sparse Bayesian learning (SBL) algorithms.