Bayesian Inference
Introduction to Bayesian Modeling with PyMC3 - Dr. Juan Camilo Orduz
We can also see this visually. We can verify the convergence of the chains formally using the Gelman Rubin test. Values close to 1.0 mean convergence. We can also test for correlation between samples in the chains. We are aiming for zero auto-correlation to get "random" samples from the posterior distribution.
Analyses of Multi-collection Corpora via Compound Topic Modeling
George, Clint P., Xia, Wei, Michailidis, George
As electronically stored data grow in daily life, obtaining novel and relevant information becomes challenging in text mining. Thus people have sought statistical methods based on term frequency, matrix algebra, or topic modeling for text mining. Popular topic models have centered on one single text collection, which is deficient for comparative text analyses. We consider a setting where one can partition the corpus into subcollections. Each subcollection shares a common set of topics, but there exists relative variation in topic proportions among collections. Including any prior knowledge about the corpus (e.g. organization structure), we propose the compound latent Dirichlet allocation (cLDA) model, improving on previous work, encouraging generalizability, and depending less on user-input parameters. To identify the parameters of interest in cLDA, we study Markov chain Monte Carlo (MCMC) and variational inference approaches extensively, and suggest an efficient MCMC method. We evaluate cLDA qualitatively and quantitatively using both synthetic and real-world corpora. The usability study on some real-world corpora illustrates the superiority of cLDA to explore the underlying topics automatically but also model their connections and variations across multiple collections.
Replacing the do-calculus with Bayes rule
Lattimore, Finnian, Rohde, David
The concept of causality has a controversial history. The question of whether it is possible to represent and address causal problems with probability theory, or if fundamentally new mathematics such as the do calculus is required has been hotly debated, e.g. Pearl (2001) states "the building blocks of our scientific and everyday knowledge are elementary facts such as "mud does not cause rain" and "symptoms do not cause disease" and those facts, strangely enough, cannot be expressed in the vocabulary of probability calculus". This has lead to a dichotomy between advocates of causal graphical modeling and the do calculus, and researchers applying Bayesian methods. In this paper we demonstrate that, while it is critical to explicitly model our assumptions on the impact of intervening in a system, provided we do so, estimating causal effects can be done entirely within the standard Bayesian paradigm. The invariance assumptions underlying causal graphical models can be encoded in ordinary Probabilistic graphical models, allowing causal estimation with Bayesian statistics, equivalent to the do calculus. Elucidating the connections between these approaches is a key step toward enabling the insights provided by each to be combined to solve real problems.
A Bayesian Solution to the M-Bias Problem
It is common practice in using regression type models for inferring causal effects, that inferring the correct causal relationship requires extra covariates are included or ``adjusted for''. Without performing this adjustment erroneous causal effects can be inferred. Given this phenomenon it is common practice to include as many covariates as possible, however such advice comes unstuck in the presence of M-bias. M-Bias is a problem in causal inference where the correct estimation of treatment effects requires that certain variables are not adjusted for i.e. are simply neglected from inclusion in the model. This issue caused a storm of controversy in 2009 when Rubin, Pearl and others disagreed about if it could be problematic to include additional variables in models when inferring causal effects. This paper makes two contributions to this issue. Firstly we provide a Bayesian solution to the M-Bias problem. The solution replicates Pearl's solution, but consistent with Rubin's advice we condition on all variables. Secondly the fact that we are able to offer a solution to this problem in Bayesian terms shows that it is indeed possible to represent causal relationships within the Bayesian paradigm, albeit in an extended space. We make several remarks on the similarities and differences between causal graphical models which implement the do-calculus and probabilistic graphical models which enable Bayesian statistics. We hope this work will stimulate more research on unifying Pearl's causal calculus using causal graphical models with traditional Bayesian statistics and probabilistic graphical models.
Bayesian Optimization with Binary Auxiliary Information
Zhang, Yehong, Dai, Zhongxiang, Low, Kian Hsiang
This paper presents novel mixed-type Bayesian optimization (BO) algorithms to accelerate the optimization of a target objective function by exploiting correlated auxiliary information of binary type that can be more cheaply obtained, such as in policy search for reinforcement learning and hyperparameter tuning of machine learning models with early stopping. To achieve this, we first propose a mixed-type multi-output Gaussian process (MOGP) to jointly model the continuous target function and binary auxiliary functions. Then, we propose information-based acquisition functions such as mixed-type entropy search (MT-ES) and mixed-type predictive ES (MT-PES) for mixed-type BO based on the MOGP predictive belief of the target and auxiliary functions. The exact acquisition functions of MT-ES and MT-PES cannot be computed in closed form and need to be approximated. We derive an efficient approximation of MT-PES via a novel mixed-type random features approximation of the MOGP model whose cross-correlation structure between the target and auxiliary functions can be exploited for improving the belief of the global target maximizer using observations from evaluating these functions. We propose new practical constraints to relate the global target maximizer to the binary auxiliary functions. We empirically evaluate the performance of MT-ES and MT-PES with synthetic and real-world experiments.
A Survey of Optimization Methods from a Machine Learning Perspective
Sun, Shiliang, Cao, Zehui, Zhu, Han, Zhao, Jing
Machine learning develops rapidly, which has made many theoretical breakthroughs and is widely applied in various fields. Optimization, as an important part of machine learning, has attracted much attention of researchers. With the exponential growth of data amount and the increase of model complexity, optimization methods in machine learning face more and more challenges. A lot of work on solving optimization problems or improving optimization methods in machine learning has been proposed successively. The systematic retrospect and summary of the optimization methods from the perspective of machine learning are of great significance, which can offer guidance for both developments of optimization and machine learning research. In this paper, we first describe the optimization problems in machine learning. Then, we introduce the principles and progresses of commonly used optimization methods. Next, we summarize the applications and developments of optimization methods in some popular machine learning fields. Finally, we explore and give some challenges and open problems for the optimization in machine learning.
From Incomplete, Dynamic Data to Bayesian Networks
Bayesian networks are a versatile and powerful tool to model complex phenomena and the interplay of their components in a probabilistically principled way. Moving beyond the comparatively simple case of completely observed, static data, which has received the most attention in the literature, in this paper we will review how Bayesian networks can model dynamic data and data with incomplete observations. Such data are the norm at the forefront of research and applications, and Bayesian networks are uniquely positioned to model them due to their explainability and interpretability.
SQIL: Imitation Learning via Regularized Behavioral Cloning
Reddy, Siddharth, Dragan, Anca D., Levine, Sergey
Learning to imitate expert behavior given action demonstrations containing high-dimensional, continuous observations and unknown dynamics is a difficult problem in robotic control. Simple approaches based on behavioral cloning (BC) suffer from state distribution shift, while more complex methods that generalize to out-of-distribution states can be difficult to use, since they typically involve adversarial optimization. We propose an alternative that combines the simplicity of BC with the robustness of adversarial imitation learning. The key insight is that under the maximum entropy model of expert behavior, BC corresponds to fitting a soft Q function that maximizes the likelihood of observed actions. This perspective suggests a way to regularize BC so that it generalizes to out-of-distribution states: combine the standard maximum-likelihood objective with a penalty on the soft Bellman error of the soft Q function. We show that this penalty term gives the agent an incentive to take actions that lead it back to demonstrated states when it encounters new states. Experiments show that our method outperforms BC and GAIL on a variety of image-based and low-dimensional environments in Box2D, Atari, and MuJoCo.
Confidence intervals for class prevalences under prior probability shift
Point estimation of class prevalences in the presence of data set shift has been a popular research topic for more than two decades. Less attention has been paid to the construction of confidence and prediction intervals for estimates of class prevalences. One little considered question is whether or not it is necessary for practical purposes to distinguish confidence and prediction intervals. Another question so far not yet conclusively answered is whether or not the discriminatory power of the classifier or score at the basis of an estimation method matters for the accuracy of the estimates of the class prevalences. This paper presents a simulation study aimed at shedding some light on these and other related questions.
Automatic Relevance Determination Bayesian Neural Networks for Credit Card Default Modelling
Mbuvha, Rendani, Boulkaibet, Illyes, Marwala, Tshilidzi
Credit risk modelling is an integral part of the global financial system. While there has been great attention paid to neural network models for credit default prediction, such models often lack the required interpretation mechanisms and measures of the uncertainty around their predictions. This work develops and compares Bayesian Neural Networks(BNNs) for credit card default modelling. This includes a BNNs trained by Gaussian approximation and the first implementation of BNNs trained by Hybrid Monte Carlo(HMC) in credit risk modelling. The results on the Taiwan Credit Dataset show that BNNs with Automatic Relevance Determination(ARD) outperform normal BNNs without ARD. The results also show that BNNs trained by Gaussian approximation display similar predictive performance to those trained by the HMC. The results further show that BNN with ARD can be used to draw inferences about the relative importance of different features thus critically aiding decision makers in explaining model output to consumers. The robustness of this result is reinforced by high levels of congruence between the features identified as important using the two different approaches for training BNNs.