Goto

Collaborating Authors

 Bayesian Learning


"Conditional Inter-Causally Independent" Node Distributions, a Property of "Noisy-Or" Models

arXiv.org Artificial Intelligence

This paper examines the interdependence generated between two parent nodes with a common instantiated child node, such as two hypotheses sharing common evidence. The relation so generated has been termed "intercausal." It is shown by construction that inter-causal independence is possible for binary distributions at one state of evidence. For such "CICI" distributions, the two measures of inter-causal effect, "multiplicative synergy" and "additive synergy" are equal. The well known "noisy-or" model is an example of such a distribution. This introduces novel semantics for the noisy-or, as a model of the degree of conflict among competing hypotheses of a common observation. In a general Bayesian network, the relation between a pair of nodes can be predictive, meaning we are interested in the effect of a node upon its successors, or, oppositely, diagnostic, where we infer the state of a node from knowledge of its successors.


Modeling a Sensor to Improve its Efficacy

arXiv.org Machine Learning

Robots rely on sensors to provide them with information about their surroundings. However, high-quality sensors can be extremely expensive and cost-prohibitive. Thus many robotic systems must make due with lower-quality sensors. Here we demonstrate via a case study how modeling a sensor can improve its efficacy when employed within a Bayesian inferential framework. As a test bed we employ a robotic arm that is designed to autonomously take its own measurements using an inexpensive LEGO light sensor to estimate the position and radius of a white circle on a black field. The light sensor integrates the light arriving from a spatially distributed region within its field of view weighted by its Spatial Sensitivity Function (SSF). We demonstrate that by incorporating an accurate model of the light sensor SSF into the likelihood function of a Bayesian inference engine, an autonomous system can make improved inferences about its surroundings. The method presented here is data-based, fairly general, and made with plug-and play in mind so that it could be implemented in similar problems.


Generalized Thompson Sampling for Sequential Decision-Making and Causal Inference

arXiv.org Artificial Intelligence

Recently, it has been shown how sampling actions from the predictive distribution over the optimal action-sometimes called Thompson sampling-can be applied to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution can then be constructed by a Bayesian superposition of the optimal policies weighted by their posterior probability that is updated by Bayesian inference and causal calculus. Here we discuss three important features of this approach. First, we discuss in how far such Thompson sampling can be regarded as a natural consequence of the Bayesian modeling of policy uncertainty. Second, we show how Thompson sampling can be used to study interactions between multiple adaptive agents, thus, opening up an avenue of game-theoretic analysis. Third, we show how Thompson sampling can be applied to infer causal relationships when interacting with an environment in a sequential fashion. In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference.


Sparse estimation via nonconcave penalized likelihood in a factor analysis model

arXiv.org Machine Learning

We consider the problem of sparse estimation in a factor analysis model. A traditional estimation procedure in use is the following two-step approach: the model is estimated by maximum likelihood method and then a rotation technique is utilized to find sparse factor loadings. However, the maximum likelihood estimates cannot be obtained when the number of variables is much larger than the number of observations. Furthermore, even if the maximum likelihood estimates are available, the rotation technique does not often produce a sufficiently sparse solution. In order to handle these problems, this paper introduces a penalized likelihood procedure that imposes a nonconvex penalty on the factor loadings. We show that the penalized likelihood procedure can be viewed as a generalization of the traditional two-step approach, and the proposed methodology can produce sparser solutions than the rotation technique. A new algorithm via the EM algorithm along with coordinate descent is introduced to compute the entire solution path, which permits the application to a wide variety of convex and nonconvex penalties. Monte Carlo simulations are conducted to investigate the performance of our modeling strategy. A real data example is also given to illustrate our procedure.


Variational Semi-blind Sparse Deconvolution with Orthogonal Kernel Bases and its Application to MRFM

arXiv.org Machine Learning

We present a variational Bayesian method of joint image reconstruction and point spread function (PSF) estimation when the PSF of the imaging device is only partially known. To solve this semi-blind deconvolution problem, prior distributions are specified for the PSF and the 3D image. Joint image reconstruction and PSF estimation is then performed within a Bayesian framework, using a variational algorithm to estimate the posterior distribution. The image prior distribution imposes an explicit atomic measure that corresponds to image sparsity. Importantly, the proposed Bayesian deconvolution algorithm does not require hand tuning. Simulation results clearly demonstrate that the semi-blind deconvolution algorithm compares favorably with previous Markov chain Monte Carlo (MCMC) version of myopic sparse reconstruction. It significantly outperforms mismatched non-blind algorithms that rely on the assumption of the perfect knowledge of the PSF. The algorithm is illustrated on real data from magnetic resonance force microscopy (MRFM).


Herded Gibbs Sampling

arXiv.org Machine Learning

The Gibbs sampler is one of the most popular algorithms for inference in statistical models. In this paper, we introduce a herding variant of this algorithm, called herded Gibbs, that is entirely deterministic. We prove that herded Gibbs has an $O(1/T)$ convergence rate for models with independent variables and for fully connected probabilistic graphical models. Herded Gibbs is shown to outperform Gibbs in the tasks of image denoising with MRFs and named entity recognition with CRFs. However, the convergence for herded Gibbs for sparsely connected probabilistic graphical models is still an open problem.


The Bounded Bayesian

arXiv.org Artificial Intelligence

The ideal Bayesian agent reasons from a global probability model, but real agents are restricted to simplified models which they know to be adequate only in restricted circumstances. Very little formal theory has been developed to help fallibly rational agents manage the process of constructing and revising small world models. The goal of this paper is to present a theoretical framework for analyzing model management approaches. For a probability forecasting problem, a search process over small world models is analyzed as an approximation to a larger-world model which the agent cannot explicitly enumerate or compute. Conditions are given under which the sequence of small-world models converges to the larger-world probabilities.


EigenGP: Sparse Gaussian process models with data-dependent eigenfunctions

arXiv.org Machine Learning

Gaussian processes (GPs) provide a nonparametric representation of functions. However, classical GP inference suffers from high computational cost and it is difficult to design nonstationary GP priors in practice. In this paper, we propose a sparse Gaussian process model, EigenGP, based on the Karhunen-Loรจve (KL) expansion of a GP prior. We use the Nystrรถm approximation to obtain data dependent eigenfunctions and select these eigenfunctions by evidence maximization. This selection reduces the number of eigenfunctions in our model and provides a nonstationary covariance function. To handle nonlinear likelihoods, we develop an efficient expectation propagation (EP) inference algorithm, and couple it with expectation maximization for eigenfunction selection. Because the eigenfunctions of a Gaussian kernel are associated with clusters of samples - including both the labeled and unlabeled - selecting relevant eigenfunctions enables EigenGP to conduct semi-supervised learning. Our experimental results demonstrate improved predictive performance of EigenGP over alternative state-of-the-art sparse GP and semisupervised learning methods for regression, classification, and semisupervised classification.


Sidestepping the Triangulation Problem in Bayesian Net Computations

arXiv.org Artificial Intelligence

This paper presents a new approach for computing posterior probabilities in Bayesian nets, which sidesteps the triangulation problem. The current state of art is the clique tree propagation approach. When the underlying graph of a Bayesian net is triangulated, this approach arranges its cliques into a tree and computes posterior probabilities by appropriately passing around messages in that tree. The computation in each clique is simply direct marginalization. When the underlying graph is not triangulated, one has to first triangulated it by adding edges. Referred to as the triangulation problem, the problem of finding an optimal or even a ?good? triangulation proves to be difficult. In this paper, we propose to first decompose a Bayesian net into smaller components by making use of Tarjan's algorithm for decomposing an undirected graph at all its minimal complete separators. Then, the components are arranged into a tree and posterior probabilities are computed by appropriately passing around messages in that tree. The computation in each component is carried out by repeating the whole procedure from the beginning. Thus the triangulation problem is sidestepped.


A Decision Calculus for Belief Functions in Valuation-Based Systems

arXiv.org Artificial Intelligence

Valuation-based system (VBS) provides a general framework for representing knowledge and drawing inferences under uncertainty. Recent studies have shown that the semantics of VBS can represent and solve Bayesian decision problems (Shenoy, 1991a). The purpose of this paper is to propose a decision calculus for Dempster-Shafer (D-S) theory in the framework of VBS. The proposed calculus uses a weighting factor whose role is similar to the probabilistic interpretation of an assumption that disambiguates decision problems represented with belief functions (Strat 1990). It will be shown that with the presented calculus, if the decision problems are represented in the valuation network properly, we can solve the problems by using fusion algorithm (Shenoy 1991a). It will also be shown the presented decision calculus can be reduced to the calculus for Bayesian probability theory when probabilities, instead of belief functions, are given.