Bayesian Inference
Switching to Learn
Shahrampour, Shahin, Rahimian, Mohammad Amin, Jadbabaie, Ali
Distributed estimation, detection, and learning theory in networks have attracted much attention over the past decades [1], [2], [3], [4], with applications that range from sensor and robotic networks [5], [6], [7], [8], [9] to social and economic networks [10], [11], [12]. In these scenarios, agents in a network need to learn the value of a parameter that they may not be able to infer on their own, but the global spread of information in the network provides them with adequate data to learn the truth collectively. As a result, agents iteratively exchange information with their neighbors. For instance, in distributed sensor and robotic networks, agents use local diffusion to augment their imperfect observations with information from their neighbors and achieve consensus and coordination [13], [14]. Similarly, agents exchange beliefs in social networks to benefit from each other's observations and private information and learn the unknown state of the world [15], [16]. Existing literature on distributed learning focuses mostly on environments where individuals communicate at every round. Of particular relevance to our discussion are a host of algorithms that follow the non-Bayesian learning scheme in Jadbabaie et.
Sublinear-Time Approximate MCMC Transitions for Probabilistic Programs
Chen, Yutian, Mansinghka, Vikash, Ghahramani, Zoubin
Probabilistic programming languages can simplify the development of machine learning techniques, but only if inference is sufficiently scalable. Unfortunately, Bayesian parameter estimation for highly coupled models such as regressions and state-space models still scales poorly; each MCMC transition takes linear time in the number of observations. This paper describes a sublinear-time algorithm for making Metropolis-Hastings (MH) updates to latent variables in probabilistic programs. The approach generalizes recently introduced approximate MH techniques: instead of subsampling data items assumed to be independent, it subsamples edges in a dynamically constructed graphical model. It thus applies to a broader class of problems and interoperates with other general-purpose inference techniques. Empirical results, including confirmation of sublinear per-transition scaling, are presented for Bayesian logistic regression, nonlinear classification via joint Dirichlet process mixtures, and parameter estimation for stochastic volatility models (with state estimation via particle MCMC). All three applications use the same implementation, and each requires under 20 lines of probabilistic code.
Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data
Gal, Yarin, Chen, Yutian, Ghahramani, Zoubin
Multivariate categorical data occur in many applications of machine learning. One of the main difficulties with these vectors of categorical variables is sparsity. The number of possible observations grows exponentially with vector length, but dataset diversity might be poor in comparison. Recent models have gained significant improvement in supervised tasks with this data. These models embed observations in a continuous space to capture similarities between them. Building on these ideas we propose a Bayesian model for the unsupervised task of distribution estimation of multivariate categorical data. We model vectors of categorical variables as generated from a non-linear transformation of a continuous latent space. Non-linearity captures multi-modality in the distribution. The continuous representation addresses sparsity. Our model ties together many existing models, linking the linear categorical latent Gaussian model, the Gaussian process latent variable model, and Gaussian process classification. We derive inference for our model based on recent developments in sampling based variational inference. We show empirically that the model outperforms its linear and discrete counterparts in imputation tasks of sparse data.
The Informed Sampler: A Discriminative Approach to Bayesian Inference in Generative Computer Vision Models
Jampani, Varun, Nowozin, Sebastian, Loper, Matthew, Gehler, Peter V.
Computer vision is hard because of a large variability in lighting, shape, and texture; in addition the image signal is non-additive due to occlusion. Generative models promised to account for this variability by accurately modelling the image formation process as a function of latent variables with prior beliefs. Bayesian posterior inference could then, in principle, explain the observation. While intuitively appealing, generative models for computer vision have largely failed to deliver on that promise due to the difficulty of posterior inference. As a result the community has favoured efficient discriminative approaches. We still believe in the usefulness of generative models in computer vision, but argue that we need to leverage existing discriminative or even heuristic computer vision methods. We implement this idea in a principled way with an "informed sampler" and in careful experiments demonstrate it on challenging generative models which contain renderer programs as their components. We concentrate on the problem of inverting an existing graphics rendering engine, an approach that can be understood as "Inverse Graphics". The informed sampler, using simple discriminative proposals based on existing computer vision technology, achieves significant improvements of inference.
Learning to Reject Sequential Importance Steps for Continuous-Time Bayesian Networks
Weiss, Jeremy C. (University of Wisconsin-Madison) | Natarajan, Sriraam (Indiana University) | Page, C. David (University of Wisconsin-Madison)
Applications of graphical models often require the use of approximate inference, such as sequential importance sampling (SIS), for estimation of the model distribution given partial evidence, i.e., the target distribution. However, when SIS proposal and target distributions are dissimilar, such procedures lead to biased estimates or require a prohibitive number of samples. We introduce ReBaSIS, a method that better approximates the target distribution by sampling variable by variable from existing importance samplers and accepting or rejecting each proposed assignment in the sequence: a choice made based on anticipating upcoming evidence. We relate the per-variable proposal and model distributions by expected weight ratios of sequence completions and show that we can learn accurate models of optimal acceptance probabilities from local samples. In a continuous-time domain, our method improves upon previous importance samplers by transforming an SIS problem into a machine learning one.
Loss-Calibrated Monte Carlo Action Selection
Abbasnejad, Ehsan (Australian National University and NICTA) | Domke, Justin (Australian National University and NICTA) | Sanner, Scott (Australian National University and NICTA)
Bayesian decision-theory underpins robust decision-making in applications ranging from plant control to robotics where hedging action selection against state uncertainty is critical for minimizing low probability but potentially catastrophic outcomes (e.g, uncontrollable plant conditions or robots falling into stairwells). Unfortunately, belief state distributions in such settings are often complex and/or high dimensional, thus prohibiting the efficient application of analytical techniques for expected utility computation when real-time control is required. This leaves Monte Carlo evaluation as one of the few viable (and hence frequently used) techniques for online action selection. However, loss-insensitive Monte Carlo methods may require large numbers of samples to identify optimal actions with high certainty since they may sample from highprobability regions that do not disambiguate action utilities. In this paper we remedy this problem by deriving an optimal proposal distribution for a loss-calibrated Monte Carlo importance sampler that bounds the regret of using an estimated optimal action. Empirically, we show that using our loss-calibrated Monte Carlo method yields high-accuracy optimal action selections in a fraction of the number of samples required by conventional loss-insensitive samplers.
Obtaining Well Calibrated Probabilities Using Bayesian Binning
Naeini, Mahdi Pakdaman (University of Pittsburgh) | Cooper, Gregory (University of Pittsburgh) | Hauskrecht, Milos (University of Pittsburgh)
However, model calibration and the learning is critical for many prediction and decision-making of well-calibrated probabilistic models have not been tasks in artificial intelligence. In this paper we present a new studied in the machine learning literature as extensively as nonparametric calibration method called Bayesian Binning for example discriminative machine learning models that into Quantiles (BBQ) which addresses key limitations of existing are built to achieve the best possible discrimination among calibration methods. The method post processes the classes of objects. One way to achieve a high level of model output of a binary classification algorithm; thus, it can be calibration is to develop methods for learning probabilistic readily combined with many existing classification algorithms.
A Bayesian Approach to Perceptual 3D Object-Part Decomposition Using Skeleton-Based Representations
El-Gaaly, Tarek (Rutgers University) | Froyen, Vicky (Rutgers University) | Elgammal, Ahmed (Rutgers University) | Feldman, Jacob (Rutgers University) | Singh, Manish (Rutgers University)
We present a probabilistic approach to shape decomposition that creates a skeleton-based shape representation of a 3D object while simultaneously decomposing it into constituent parts. Our approach probabilistically combines two prominent threads from the shape literature: skeleton-based (medial axis) representations of shape, and part-based representations of shape, in which shapes are combinations of primitive parts. Our approach recasts skeleton-based shape representation as a mixture estimation problem, allowing us to apply probabilistic estimation techniques to the problem of 3D shape decomposition, extending earlier work on the 2D case. The estimated 3D shape decompositions approximate human shape decomposition judgments. We present a tractable implementation of the framework, which begins by over-segmenting objects at concavities, and then probabilistically merges them to create a distribution over possible decompositions. This results in a hierarchy of decompositions at different structural scales, again closely matching known properties of human shape representation. The probabilistic estimation procedures that arise naturally in the model allow effective prediction of missing parts. We present results on shapes from a standard database illustrating the effectiveness of the approach.
An Exact Algorithm for Solving Most Relevant Explanation in Bayesian Networks
Zhu, Xiaoyuan (Queens College, City University of New York) | Yuan, Changhe (Queens College, City University of New York)
Most Relevant Explanation (MRE) is a new inference task in Bayesian networks that finds the most relevant partial instantiation of target variables as an explanation for given evidence by maximizing the Generalized Bayes Factor (GBF). No exact algorithm has been developed for solving MRE previously. This paper fills the void and introduces a breadth-first branch-and-bound MRE algorithm based on a novel upper bound on GBF. The bound is calculated by decomposing the computation of the score to a set of Markov blankets of subsets of evidence variables. Our empirical evaluations show that the proposed algorithm scales up exact MRE inference significantly.
Just Count the Satisfied Groundings: Scalable Local-Search and Sampling Based Inference in MLNs
Venugopal, Deepak (The University of Texas at Dallas) | Sarkhel, Somdeb (The University of Texas at Dallas) | Gogate, Vibhav (The University of Texas at Dallas)
The main computational bottleneck in various sampling based and local-search based inference algorithms for Markov logic networks (e.g., Gibbs sampling, MC-SAT, MaxWalksat, etc.) is computing the number of groundings of a first-order formula that are true given a truth assignment to all of its ground atoms. We reduce this problem to the problem of counting the number of solutions of a constraint satisfaction problem (CSP) and show that during their execution, both sampling based and local-search based algorithms repeatedly solve dynamic versions of this counting problem. Deriving from the vast amount of literature on CSPs and graphical models, we propose an exact junction-tree based algorithm for computing the number of solutions of the dynamic CSP, analyze its properties, and show how it can be used to improve the computational complexity of Gibbs sampling and MaxWalksat. Empirical tests on a variety of benchmarks clearly show that our new approach is several orders of magnitude more scalable than existing approaches.