Goto

Collaborating Authors

 Undirected Networks


Learning Partially Observable Deterministic Action Models

Journal of Artificial Intelligence Research

We present exact algorithms for identifying deterministic-actions' effects and preconditions in dynamic partially observable domains. They apply when one does not know the action model(the way actions affect the world) of a domain and must learn it from partial observations over time. Such scenarios are common in real world applications. They are challenging for AI tasks because traditional domain structures that underly tractability (e.g., conditional independence) fail there (e.g., world features become correlated). Our work departs from traditional assumptions about partial observations and action models. In particular, it focuses on problems in which actions are deterministic of simple logical structure and observation models have all features observed with some frequency. We yield tractable algorithms for the modified problem for such domains. Our algorithms take sequences of partial observations over time as input, and output deterministic action models that could have lead to those observations. The algorithms output all or one of those models (depending on our choice), and are exact in that no model is misclassified given the observations. Our algorithms take polynomial time in the number of time steps and state features for some traditional action classes examined in the AI-planning literature, e.g., STRIPS actions. In contrast, traditional approaches for HMMs and Reinforcement Learning are inexact and exponentially intractable for such domains. Our experiments verify the theoretical tractability guarantees, and show that we identify action models exactly. Several applications in planning, autonomous exploration, and adventure-game playing already use these results. They are also promising for probabilistic settings, partially observable reinforcement learning, and diagnosis.


Inference with Discriminative Posterior

arXiv.org Machine Learning

We study Bayesian discriminative inference given a model family $p(c,\x, \theta)$ that is assumed to contain all our prior information but still known to be incorrect. This falls in between "standard" Bayesian generative modeling and Bayesian regression, where the margin $p(\x,\theta)$ is known to be uninformative about $p(c|\x,\theta)$. We give an axiomatic proof that discriminative posterior is consistent for conditional inference; using the discriminative posterior is standard practice in classical Bayesian regression, but we show that it is theoretically justified for model families of joint densities as well. A practical benefit compared to Bayesian regression is that the standard methods of handling missing values in generative modeling can be extended into discriminative inference, which is useful if the amount of data is small. Compared to standard generative modeling, discriminative posterior results in better conditional inference if the model family is incorrect. If the model family contains also the true model, the discriminative posterior gives the same result as standard Bayesian generative modeling. Practical computation is done with Markov chain Monte Carlo.


Improved Estimation of High-dimensional Ising Models

arXiv.org Machine Learning

We consider the problem of jointly estimating the parameters as well as the structure of binary valued Markov Random Fields, in contrast to earlier work that focus on one of the two problems. We formulate the problem as a maximization of $\ell_1$-regularized surrogate likelihood that allows us to find a sparse solution. Our optimization technique efficiently incorporates the cutting-plane algorithm in order to obtain a tighter outer bound on the marginal polytope, which results in improvement of both parameter estimates and approximation to marginals. On synthetic data, we compare our algorithm on the two estimation tasks to the other existing methods. We analyze the method in the high-dimensional setting, where the number of dimensions $p$ is allowed to grow with the number of observations $n$. The rate of convergence of the estimate is demonstrated to depend explicitly on the sparsity of the underlying graph.


On the Possibility of Learning in Reactive Environments with Arbitrary Dependence

arXiv.org Artificial Intelligence

We address the problem of reinforcement learning in which observations may exhibit an arbitrary form of stochastic dependence on past observations and actions, i.e. environments more general than (PO)MDPs. The task for an agent is to attain the best possible asymptotic reward where the true generating environment is unknown but belongs to a known countable family of environments. We find some sufficient conditions on the class of environments under which an agent exists which attains the best asymptotic reward for any environment in the class. We analyze how tight these conditions are and how they relate to different probabilistic assumptions known in reinforcement learning and related fields, such as Markov Decision Processes and mixing conditions.


Randomized Distributed Configuration Management of Wireless Networks: Multi-layer Markov Random Fields and Near-Optimality

arXiv.org Artificial Intelligence

Distributed configuration management is imperative for wireless infrastructureless networks where each node adjusts locally its physical and logical configuration through information exchange with neighbors. Two issues remain open. The first is the optimality. The second is the complexity. We study these issues through modeling, analysis, and randomized distributed algorithms. Modeling defines the optimality. We first derive a global probabilistic model for a network configuration which characterizes jointly the statistical spatial dependence of a physical- and a logical-configuration. We then show that a local model which approximates the global model is a two-layer Markov Random Field or a random bond model. The complexity of the local model is the communication range among nodes. The local model is near-optimal when the approximation error to the global model is within a given error bound. We analyze the trade-off between an approximation error and complexity, and derive sufficient conditions on the near-optimality of the local model. We validate the model, the analysis and the randomized distributed algorithms also through simulation.


Online Planning Algorithms for POMDPs

Journal of Artificial Intelligence Research

Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the execution. Online algorithms generally consist of a lookahead search to find the best action to execute at each time step in an environment. Our objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics (return, error bound reduction, lower bound improvement). Our experimental results indicate that state-of-the-art online heuristic search methods can handle large POMDP domains efficiently.


A new Hedging algorithm and its application to inferring latent random variables

arXiv.org Artificial Intelligence

We present a new online learning algorithm for cumulative discounted gain. This learning algorithm does not use exponential weights on the experts. Instead, it uses a weighting scheme that depends on the regret of the master algorithm relative to the experts. In particular, experts whose discounted cumulative gain is smaller (worse) than that of the master algorithm receive zero weight. We also sketch how a regret-based algorithm can be used as an alternative to Bayesian averaging in the context of inferring latent random variables.


Refining the Execution of Abstract Actions with Learned Action Models

Journal of Artificial Intelligence Research

Robots reason about abstract actions, such as "go to position `l'", in order to decide what to do or to generate plans for their intended course of action. The use of abstract actions enables robots to employ small action libraries, which reduces the search space for decision making. When executing the actions, however, the robot must tailor the abstract actions to the specific task and situation context at hand. In this article we propose a novel robot action execution system that learns success and performance models for possible specializations of abstract actions. At execution time, the robot uses these models to optimize the execution of abstract actions to the respective task contexts. The robot can so use abstract actions for efficient reasoning, without compromising the performance of action execution. We show the impact of our action execution model in three robotic domains and on two kinds of action execution problems: (1) the instantiation of free action parameters to optimize the expected performance of action sequences; (2) the automatic introduction of additional subgoals to make action sequences more reliable.


Adaptive Stochastic Resource Control: A Machine Learning Approach

Journal of Artificial Intelligence Research

The paper investigates stochastic resource allocation problems with scarce, reusable resources and non-preemtive, time-dependent, interconnected tasks. This approach is a natural generalization of several standard resource management problems, such as scheduling and transportation problems. First, reactive solutions are considered and defined as control policies of suitably reformulated Markov decision processes (MDPs). We argue that this reformulation has several favorable properties, such as it has finite state and action spaces, it is aperiodic, hence all policies are proper and the space of control policies can be safely restricted. Next, approximate dynamic programming (ADP) methods, such as fitted Q-learning, are suggested for computing an efficient control policy. In order to compactly maintain the cost-to-go function, two representations are studied: hash tables and support vector regression (SVR), particularly, nu-SVRs. Several additional improvements, such as the application of limited-lookahead rollout algorithms in the initial phases, action space decomposition, task clustering and distributed sampling are investigated, too. Finally, experimental results on both benchmark and industry-related data are presented.


Optimal and Approximate Q-value Functions for Decentralized POMDPs

Journal of Artificial Intelligence Research

Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.