Bayesian Inference
Revisiting Gaussian Process Dynamical Models
Zhao, Jing (East China Normal University) | Sun, Shiliang (East China Normal University)
The recently proposed Gaussian process dynamical models (GPDMs) have been successfully applied to time series modeling. There are four learning algorithms for GPDMs: maximizing a posterior (MAP), fixing the kernel hyperparameters ฮฑ _ (Fix.ฮฑ _ ), balanced GPDM (B-GPDM) and two-stage MAP (T.MAP), which are designed for model training with complete data. When data are incomplete, GPDMs reconstruct the missing data using a function of the latent variables before parameter updates, which, however, may cause cumulative errors. In this paper, we present four new algorithms (MAP+, Fix.ฮฑ + , B-GPDM+ and T.MAP+) for learning GPDMs with incomplete training data and a new conditional model (CM+) for recovering incomplete test data. Our methods adopt the Bayesian framework and can fully and properly use the partially observed data. We conduct experiments on incomplete motion capture data (walk, run, swing and multiple-walker) and make comparisons with the existing four algorithms as well as k-NN, spline interpolation and VGPDS. Our methods perform much better on both training with incomplete data and recovering incomplete test data.
The Complexity of MAP Inference in Bayesian Networks Specified Through Logical Languages
Maua, Denis Deratani (Universidade de Sao Paulo) | Campos, Cassio Polpo de (Queen's University Belfast) | Cozman, Fabio Gagliardi (Universidade de Sao Paulo)
We study the computational complexity of finding maximum a posteriori configurations in Bayesian networks whose probabilities are specified by logical formulas. This approach leads to a fine grained study in which local information such as context-sensitive independence and determinism can be considered. It also allows us to characterize more precisely the jump from tractability to NP-hardness and beyond, and to consider the complexity introduced by evidence alone.
Bayesian Modelling of Community-Based Multidimensional Trust in Participatory Sensing under Data Sparsity
Venanzi, Matteo (University of Southampton) | Teacy, Luke (University of Southampton) | Rogers, Alex (University of Southampton) | Jennings, Nick (University of Southampton)
We propose a new Bayesian model for reliable aggregatio of crowdsourced estimates of real-valued quantities in participatory sensing applications. Existing approaches focus on probabilistic modelling of userโs reliability as the key to accurate aggregation. However, these are either limited to estimating discrete quantities, or require a significant number of reports from each user to accurately model their reliability. To mitigate these issues, we adopt a community-based approach, which reduces the data required to reliably aggregate real-valued estimates, by leveraging correlations between the reporting behaviour of users belonging to different communities. As a result, our method is up to 16.6% more accurate than existing state-of-the-art methods and is up to 49% more effective under data sparsity when used to estimate Wi-Fi hotspot locations in a real-world crowdsourcing application.
Differential Semantics of Intervention in Bayesian Networks
Qin, Biao (Renmin University of China)
Differentiation is an important inference method in Bayesian networks and intervention is a basic notion in causal Bayesian networks. In this paper, we reveal the connection between differentiation and intervention in Bayesian networks. We first encode an intervention as changing a conditional probabilistic table into a partial intervention table. We next introduce a jointree algorithm to compute the full atomic interventions of all nodes with respect to evidence in a Bayesian network. We further discover that an intervention has differential semantics if the intervention variables can reach the evidence in Bayesian networks and the output of the state-of-the-art algorithm is not the differentiation but the intervention of a Bayesian network if the differential nodes cannot reach any one of the evidence nodes. Finally, we present experimental results to demonstrate the efficiency of our algorithm to infer the causal effect in Bayesian networks.
From Weighted to Unweighted Model Counting
Chakraborty, Supratik (Indian Institute of Technology, Bombay) | Fried, Dror (Rice University) | Meel, Kuldeep S. (Rice University) | Vardi, Moshe Y. (Rice University)
The recent surge of interest in reasoning about probabilistic graphical models has led to the development of various techniques for probabilistic reasoning. Of these, techniques based on weighted model counting are particularly interesting since they can potentially leverage recent advances in unweighted model counting and in propositional satisfiability solving. In this paper, we present a new approach to weighted model counting via reduction to unweighted model counting. Our reduction, which is polynomial-time and preserves the normal form (CNF/DNF) of the input formula, allows us to exploit advances in unweighted model counting to solve weighted model counting instances. Experiments with weighted model counters built using our reduction indicate that these counters performs much better than a state-of-the-art weighted model counter
Probabilistic Inference Based Message-Passing for Resource Constrained DCOPs
Ghosh, Supriyo (Singapore Management University) | Kumar, Akshat (Singapore Management University) | Varakantham, Pradeep (Singapore Management University)
Distributed constraint optimization (DCOP) is an important framework for coordinated multiagent decision making. We address a practically useful variant of DCOP, called resource-constrained DCOP (RC-DCOP), which takes into account agents' consumption of shared limited resources. We present a promising new class of algorithm for RC-DCOPs by translating the underlying coordination problem to probabilistic inference. Using inference techniques such as expectation-maximization and convex optimization machinery, we develop a novel convergent message-passing algorithm for RC-DCOPs. Experiments on standard benchmarks show that our approach provides better quality than previous best DCOP algorithms and has much lower failure rate. Comparisons against an efficient centralized solver show that our approach provides near-optimal solutions, and is significantly faster on larger instances.
Bayesian Modeling with Gaussian Processes using the GPstuff Toolbox
Vanhatalo, Jarno, Riihimรคki, Jaakko, Hartikainen, Jouni, Jylรคnki, Pasi, Tolvanen, Ville, Vehtari, Aki
Gaussian processes (GP) are powerful tools for probabilistic modeling purposes. They can be used to define prior distributions over latent functions in hierarchical Bayesian models. The prior over functions is defined implicitly by the mean and covariance function, which determine the smoothness and variability of the function. The inference can then be conducted directly in the function space by evaluating or approximating the posterior process. Despite their attractive theoretical properties GPs provide practical challenges in their implementation. GPstuff is a versatile collection of computational tools for GP models compatible with Linux and Windows MATLAB and Octave. It includes, among others, various inference methods, sparse approximations and tools for model assessment. In this work, we review these tools and demonstrate the use of GPstuff in several models.
Solomonoff Induction Violates Nicod's Criterion
Nicod's criterion states that observing a black raven is evidence for the hypothesis H that all ravens are black. We show that Solomonoff induction does not satisfy Nicod's criterion: there are time steps in which observing black ravens decreases the belief in H. Moreover, while observing any computable infinite string compatible with H, the belief in H decreases infinitely often when using the unnormalized Solomonoff prior, but only finitely often when using the normalized Solomonoff prior. We argue that the fault is not with Solomonoff induction; instead we should reject Nicod's criterion.
Scalable Bayesian Optimization Using Deep Neural Networks
Snoek, Jasper, Rippel, Oren, Swersky, Kevin, Kiros, Ryan, Satish, Nadathur, Sundaram, Narayanan, Patwary, Md. Mostofa Ali, Prabhat, null, Adams, Ryan P.
Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations. It relies on querying a distribution over functions defined by a relatively cheap surrogate model. An accurate model for this distribution over functions is critical to the effectiveness of the approach, and is typically fit using Gaussian processes (GPs). However, since GPs scale cubically with the number of observations, it has been challenging to handle objectives whose optimization requires many evaluations, and as such, massively parallelizing the optimization. In this work, we explore the use of neural networks as an alternative to GPs to model distributions over functions. We show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically. This allows us to achieve a previously intractable degree of parallelism, which we apply to large scale hyperparameter optimization, rapidly finding competitive models on benchmark object recognition tasks using convolutional networks, and image caption generation using neural language models.
Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes
Zhou, Mingyuan, Padilla, Oscar Hernan Madrid, Scott, James G.
We define a family of probability distributions for random count matrices with a potentially unbounded number of rows and columns. The three distributions we consider are derived from the gamma-Poisson, gamma-negative binomial, and beta-negative binomial processes. Because the models lead to closed-form Gibbs sampling update equations, they are natural candidates for nonparametric Bayesian priors over count matrices. A key aspect of our analysis is the recognition that, although the random count matrices within the family are defined by a row-wise construction, their columns can be shown to be i.i.d. This fact is used to derive explicit formulas for drawing all the columns at once. Moreover, by analyzing these matrices' combinatorial structure, we describe how to sequentially construct a column-i.i.d. random count matrix one row at a time, and derive the predictive distribution of a new row count vector with previously unseen features. We describe the similarities and differences between the three priors, and argue that the greater flexibility of the gamma- and beta- negative binomial processes, especially their ability to model over-dispersed, heavy-tailed count data, makes these well suited to a wide variety of real-world applications. As an example of our framework, we construct a naive-Bayes text classifier to categorize a count vector to one of several existing random count matrices of different categories. The classifier supports an unbounded number of features, and unlike most existing methods, it does not require a predefined finite vocabulary to be shared by all the categories, and needs neither feature selection nor parameter tuning. Both the gamma- and beta- negative binomial processes are shown to significantly outperform the gamma-Poisson process for document categorization, with comparable performance to other state-of-the-art supervised text classification algorithms.