Goto

Collaborating Authors

 Learning Graphical Models


PAC-Bayesian Policy Evaluation for Reinforcement Learning

arXiv.org Machine Learning

Bayesian priors offer a compact yet general means of incorporating domain knowledge into many learning tasks. The correctness of the Bayesian analysis and inference, however, largely depends on accuracy and correctness of these priors. PAC-Bayesian methods overcome this problem by providing bounds that hold regardless of the correctness of the prior distribution. This paper introduces the first PAC-Bayesian bound for the batch reinforcement learning problem with function approximation. We show how this bound can be used to perform model-selection in a transfer learning scenario. Our empirical results confirm that PAC-Bayesian policy evaluation is able to leverage prior distributions when they are informative and, unlike standard Bayesian RL approaches, ignore them when they are misleading.


Near-Optimal Target Learning With Stochastic Binary Signals

arXiv.org Machine Learning

We study learning in a noisy bisection model: specifically, Bayesian algorithms to learn a target value V given access only to noisy realizations of whether V is less than or greater than a threshold theta. At step t = 0, 1, 2, ..., the learner sets threshold theta t and observes a noisy realization of sign(V - theta t). After T steps, the goal is to output an estimate V^ which is within an eta-tolerance of V . This problem has been studied, predominantly in environments with a fixed error probability q < 1/2 for the noisy realization of sign(V - theta t). In practice, it is often the case that q can approach 1/2, especially as theta -> V, and there is little known when this happens. We give a pseudo-Bayesian algorithm which provably converges to V. When the true prior matches our algorithm's Gaussian prior, we show near-optimal expected performance. Our methods extend to the general multiple-threshold setting where the observation noisily indicates which of k >= 2 regions V belongs to.


Belief Propagation by Message Passing in Junction Trees: Computing Each Message Faster Using GPU Parallelization

arXiv.org Artificial Intelligence

Compiling Bayesian networks (BNs) to junction trees and performing belief propagation over them is among the most prominent approaches to computing posteriors in BNs. However, belief propagation over junction tree is known to be computationally intensive in the general case. Its complexity may increase dramatically with the connectivity and state space cardinality of Bayesian network nodes. In this paper, we address this computational challenge using GPU parallelization. We develop data structures and algorithms that extend existing junction tree techniques, and specifically develop a novel approach to computing each belief propagation message in parallel. We implement our approach on an NVIDIA GPU and test it using BNs from several applications. Experimentally, we study how junction tree parameters affect parallelization opportunities and hence the performance of our algorithm. We achieve speedups ranging from 0.68 to 9.18 for the BNs studied.


Measuring the Hardness of Stochastic Sampling on Bayesian Networks with Deterministic Causalities: the k-Test

arXiv.org Artificial Intelligence

Approximate Bayesian inference is NP-hard. Dagum and Luby defined the Local Variance Bound (LVB) to measure the approximation hardness of Bayesian inference on Bayesian networks, assuming the networks model strictly positive joint probability distributions, i.e. zero probabilities are not permitted. This paper introduces the k-test to measure the approximation hardness of inference on Bayesian networks with deterministic causalities in the probability distribution, i.e. when zero conditional probabilities are permitted. Approximation by stochastic sampling is a widely-used inference method that is known to suffer from inefficiencies due to sample rejection. The k-test predicts when rejection rates of stochastic sampling a Bayesian network will be low, modest, high, or when sampling is intractable.


The Structure of Signals: Causal Interdependence Models for Games of Incomplete Information

arXiv.org Artificial Intelligence

Traditional economic models typically treat private information, or signals, as generated from some underlying state. Recent work has explicated alternative models, where signals correspond to interpretations of available information. We show that the difference between these formulations can be sharply cast in terms of causal dependence structure, and employ graphical models to illustrate the distinguishing characteristics. The graphical representation supports inferences about signal patterns in the interpreted framework, and suggests how results based on the generated model can be extended to more general situations. Specific insights about bidding games in classical auction mechanisms derive from qualitative graphical models.


Symbolic Dynamic Programming for Discrete and Continuous State MDPs

arXiv.org Artificial Intelligence

Many real-world decision-theoretic planning problems can be naturally modeled with discrete and continuous state Markov decision processes (DC-MDPs). While previous work has addressed automated decision-theoretic planning for DCMDPs, optimal solutions have only been defined so far for limited settings, e.g., DC-MDPs having hyper-rectangular piecewise linear value functions. In this work, we extend symbolic dynamic programming (SDP) techniques to provide optimal solutions for a vastly expanded class of DCMDPs. To address the inherent combinatorial aspects of SDP, we introduce the XADD - a continuous variable extension of the algebraic decision diagram (ADD) - that maintains compact representations of the exact value function. Empirically, we demonstrate an implementation of SDP with XADDs on various DC-MDPs, showing the first optimal automated solutions to DCMDPs with linear and nonlinear piecewise partitioned value functions and showing the advantages of constraint-based pruning for XADDs.


Compressed Inference for Probabilistic Sequential Models

arXiv.org Artificial Intelligence

Hidden Markov models (HMMs) and conditional random fields (CRFs) are two popular techniques for modeling sequential data. Inference algorithms designed over CRFs and HMMs allow estimation of the state sequence given the observations. In several applications, estimation of the state sequence is not the end goal; instead the goal is to compute some function of it. In such scenarios, estimating the state sequence by conventional inference techniques, followed by computing the functional mapping from the estimate is not necessarily optimal. A more formal approach is to directly infer the final outcome from the observations. In particular, we consider the specific instantiation of the problem where the goal is to find the state trajectories without exact transition points and derive a novel polynomial time inference algorithm that outperforms vanilla inference techniques. We show that this particular problem arises commonly in many disparate applications and present experiments on three of them: (1) Toy robot tracking; (2) Single stroke character recognition; (3) Handwritten word recognition.


Price Updating in Combinatorial Prediction Markets with Bayesian Networks

arXiv.org Artificial Intelligence

To overcome the #P-hardness of computing/updating prices in logarithm market scoring rule-based (LMSR-based) combinatorial prediction markets, Chen et al. [5] recently used a simple Bayesian network to represent the prices of securities in combinatorial prediction markets for tournaments, and showed that two types of popular securities are structure preserving. In this paper, we significantly extend this idea by employing Bayesian networks in general combinatorial prediction markets. We reveal a very natural connection between LMSR-based combinatorial prediction markets and probabilistic belief aggregation, which leads to a complete characterization of all structure preserving securities for decomposable network structures. Notably, the main results by Chen et al. [5] are corollaries of our characterization. We then prove that in order for a very basic set of securities to be structure preserving, the graph of the Bayesian network must be decomposable. We also discuss some approximation techniques for securities that are not structure preserving.


Iterated risk measures for risk-sensitive Markov decision processes with discounted cost

arXiv.org Artificial Intelligence

We demonstrate a limitation of discounted expected utility, a standard approach for representing the preference to risk when future cost is discounted. Specifically, we provide an example of the preference of a decision maker that appears to be rational but cannot be represented with any discounted expected utility. A straightforward modification to discounted expected utility leads to inconsistent decision making over time. We will show that an iterated risk measure can represent the preference that cannot be represented by any discounted expected utility and that the decisions based on the iterated risk measure are consistent over time.


A Geometric Traversal Algorithm for Reward-Uncertain MDPs

arXiv.org Artificial Intelligence

Markov decision processes (MDPs) are widely used in modeling decision making problems in stochastic environments. However, precise specification of the reward functions in MDPs is often very difficult. Recent approaches have focused on computing an optimal policy based on the minimax regret criterion for obtaining a robust policy under uncertainty in the reward function. One of the core tasks in computing the minimax regret policy is to obtain the set of all policies that can be optimal for some candidate reward function. In this paper, we propose an efficient algorithm that exploits the geometric properties of the reward function associated with the policies. We also present an approximate version of the method for further speed up. We experimentally demonstrate that our algorithm improves the performance by orders of magnitude.