Goto

Collaborating Authors

 Learning Graphical Models


Solving Uncertain MDPs with Objectives that Are Separable over Instantiations of Model Uncertainty

AAAI Conferences

Markov Decision Problems, MDPs offer an effective mechanism for planning under uncertainty. However, due to unavoidable uncertainty over models, it is difficult to obtain an exact specification of an MDP. We are interested in solving MDPs, where transition and reward functions are not exactly specified. Existing research has primarily focussed on computing infinite horizon stationary policies when optimizing robustness, regret and percentile based objectives. We focus specifically on finite horizon problems with a special emphasis on objectives that are separable over individual instantiations of model uncertainty (i.e., objectives that can be expressed as a sum over instantiations of model uncertainty): (a) First, we identify two separable objectives for uncertain MDPs: Average Value Maximization (AVM) and Confidence Probability Maximisation (CPM). (b) Second, we provide optimization based solutions to compute policies for uncertain MDPs with such objectives. In particular, we exploit the separability of AVM and CPM objectives by employing Lagrangian dual decomposition(LDD). (c) Finally, we demonstrate the utility of the LDD approach on a benchmark problem from the literature.


Real-Time Symbolic Dynamic Programming

AAAI Conferences

Recent advances in Symbolic Dynamic Programming (SDP) combined withthe extended algebraic decision diagram (XADD) have provided exactsolutions for expressive subclasses of finite-horizon Hybrid MarkovDecision Processes (HMDPs) with mixed continuous and discrete stateand action parameters. Unfortunately, SDP suffers from two majordrawbacks: (1) it solves for all states and can be intractable formany problems that inherently have large optimal XADD value functionrepresentations; and (2) it cannot maintain compact (pruned) XADDrepresentations for domains with nonlinear dynamics and reward due tothe need for nonlinear constraint checking. In this work, wesimultaneously address both of these problems by introducing real-timeSDP (RTSDP). RTSDP addresses (1) by focusing the solution and valuerepresentation only on regions reachable from a set of initial statesand RTSDP addresses (2) by using visited states as witnesses ofreachable regions to assist in pruning irrelevant or unreachable(nonlinear) regions of the value function. To this end, RTSDP enjoysprovable convergence over the set of initial states and substantialspace and time savings over SDP as we demonstrate in a variety of hybrid domains ranging from inventory to reservoir to traffic control.


Exploiting Submodular Value Functions for Faster Dynamic Sensor Selection

AAAI Conferences

A key challenge in the design of multi-sensor systems is the efficient allocation of scarce resources such as bandwidth, CPU cycles, and energy, leading to the dynamic sensor selection problem in which a subset of the available sensors must be selected at each timestep. While partially observable Markov decision processes (POMDPs) provide a natural decision-theoretic model for this problem, the computational cost of POMDP planning grows exponentially in the number of sensors, making it feasible only for small problems. We propose a new POMDP planning method that uses greedy maximization to greatly improve scalability in the number of sensors. We show that, under certain conditions, the value function of a dynamic sensor selection POMDP is submodular and use this result to bound the error introduced by performing greedy maximization. Experimental results on a real-world dataset from a multi-camera tracking system in a shopping mall show it achieves similar performance to existing methods but incurs only a fraction of the computational cost, leading to much better scalability in the number of cameras.


Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes

AAAI Conferences

In many situations, it is desirable to optimize a sequence of decisions by maximizing a primary objective while respecting some constraints with respect to secondary objectives. Such problems can be naturally modeled as constrained partially observable Markov decision processes (CPOMDPs) when the environment is partially observable. In this work, we describe a technique based on approximate linear programming to optimize policies in CPOMDPs. The optimization is performed offline and produces a finite state controller with desirable performance guarantees. The approach outperforms a constrained version of point-based value iteration on a suite of benchmark problems.


Information Gathering and Reward Exploitation of Subgoals for POMDPs

AAAI Conferences

Planning in large partially observable Markov decision processes (POMDPs) is challenging especially when a long planning horizon is required. A few recent algorithms successfully tackle this case but at the expense of a weaker information-gathering capacity. In this paper, we propose Information Gathering and Reward Exploitation of Subgoals (IGRES), a randomized POMDP planning algorithm that leverages information in the state space to automatically generate "macro-actions" to tackle tasks with long planning horizons, while locally exploring the belief space to allow effective information gathering. Experimental results show that IGRES is an effective multi-purpose POMDP solver, providing state-of-the-art performance for both long horizon planning tasks and information-gathering tasks on benchmark domains. Additional experiments with an ecological adaptive management problem indicate that IGRES is a promising tool for POMDP planning in real-world settings.


Self-Paced Learning for Matrix Factorization

AAAI Conferences

Matrix factorization (MF) has been attracting much attention due to its wide applications. However, since MF models are generally non-convex, most of the existing methods are easily stuck into bad local minima, especially in the presence of outliers and missing data. To alleviate this deficiency, in this study we present a new MF learning methodology by gradually including matrix elements into MF training from easy to complex. This corresponds to a recently proposed learning fashion called self-paced learning (SPL), which has been demonstrated to be beneficial in avoiding bad local minima. We also generalize the conventional binary (hard) weighting scheme for SPL to a more effective real-valued (soft) weighting manner. The effectiveness of the proposed self-paced MF method is substantiated by a series of experiments on synthetic, structure from motion and background subtraction data.


Optimal Estimation of Multivariate ARMA Models

AAAI Conferences

A central problem in applied data analysis is time series In this paper, we develop a tractable approach to maximum modeling--estimating and forecasting a discrete-time likelihood parameter estimation for stochastic multivariate stochastic process--for which the autoregressive moving ARMA models. To efficiently compute a globally average (ARMA) and stochastic ARMA (Thiesson et al. optimal estimate, the problem is re-expressed as a regularized 2012) are fundamental models. An ARMA model describes loss minimization, which then allows recent algorithmic the behavior of a linear dynamical system under advances in sparse estimation to be applied (Shah et al. latent Gaussian perturbations (Brockwell and Davis 2002; 2012; Candes et al. 2011; Bach, Mairal, and Ponce 2008; Lütkepohl 2007), which affords intuitive modeling capability, Zhang et al. 2011; White et al. 2012). Although there has efficient forecasting algorithms, and a close relationship been recent progress in global estimation for ARMA, such to linear Gaussian state-space models (Katayama 2006, approaches have either been restricted to single-input singleoutput pp.5-6).


Gaussian Cardinality Restricted Boltzmann Machines

AAAI Conferences

Restricted Boltzmann Machine (RBM) has been applied to a wide variety of tasks due to its advantage in feature extraction. Implementing sparsity constraint in the activated hidden units of RBM is an important improvement on RBM. The sparsity constraints in the existing methods are usually specified by users and are independent of the input data. However, the input data could be heterogeneous in content and thus naturally demand elastic and adaptive settings of the sparsity constraints. To solve this problem, we proposed a generalized model with adaptive sparsity constraint, named Gaussian Cardinality Restricted Boltzmann Machines (GC-RBM). In this model, the thresholds of hidden unit activations are decided by the input data and a given Gaussian distribution on the pre-training phase. We provide a principled method to train the GC-RBM with Gaussian prior. Experimental results on two real world data sets justify the effectiveness of the proposed method and its superiority over CaRBM in terms of classification accuracy.


Compress and Control

AAAI Conferences

This paper describes a new information-theoretic policy evaluation technique for reinforcement learning. This technique converts any compression or density model into a corresponding estimate of value. Under appropriate stationarity and ergodicity conditions, we show that the use of a sufficiently powerful model gives rise to a consistent value function estimator. We also study the behavior of this technique when applied to various Atari 2600 video games, where the use of suboptimal modeling techniques is unavoidable. We consider three fundamentally different models, all too limited to perfectly model the dynamics of the system. Remarkably, we find that our technique provides sufficiently accurate value estimates for effective on-policy control. We conclude with a suggestive study highlighting the potential of our technique to scale to large problems.


TODTLER: Two-Order-Deep Transfer Learning

AAAI Conferences

The traditional way of obtaining models from data, inductive learning, has proved itself both in theory and in many practical applications. However, in domains where data is difficult or expensive to obtain, e.g., medicine, deep transfer learning is a more promising technique. It circumvents the model acquisition difficulties caused by scarce data in a target domain by carrying over structural properties of a model learned in a source domain where training data is ample. Nonetheless, the lack of a principled view of transfer learning so far has limited its adoption. In this paper, we address this issue by regarding transfer learning as a process that biases learning in a target domain in favor of patterns useful in a source domain. Specifically, we consider a first-order logic model of the data as an instantiation of a set of second-order templates. Hence, the usefulness of a model is partly determined by the learner's prior distribution over these template sets. The main insight of our work is that transferring knowledge amounts to acquiring a posterior over the second-order template sets by learning in the source domain and using this posterior when learning in the target setting. Our experimental evaluation demonstrates our approach to outperform the existing transfer learning techniques in terms of accuracy and runtime.