Markov Models
Integrating the Human Recommendations in the Decision Process of Autonomous Agents: A Goal Biased Markov Decision Process
Cote, Nicolas (GREYC - CNRS (UMR0672), Université) | Bouzid, Maroua (de Caen) | Mouaddib, Abdel-Illah ( GREYC - CNRS (UMR0672), Université)
In this paper, we address the problem of computing the policy of an autonomous agent, taking human recommendations into account which could be appropriate for mixed initiative, or adjustable autonomy. For this purpose, we present Goal Biased Markov Decision Process (GBMDP) which assume two kinds of recommendation. The human recommends to the agent to avoid some situations (represented by undesirable states), or he recommends favorable situations represented by desirable states. The agent takes those recommendations into account by updating its policy (only updating the states concerned by the recommendations, not the whole policy). We show that GBMDP is efficient and it improves the human's intervention by reducing its time of attention paid to the agent. Moreover, GBMDP optimizes robot's computation time by updating only the necessary states. We also show how GBMDP can consider more than one recommendation. Finally, our experiments show how we update policies which are intractable by standard approaches.
Context-Aware Recommender Systems
Adomavicius, Gediminas (University of Minnesota) | Mobasher, Bamshad (DePaul University) | Ricci, Francesco (Free University of Bozen-Bolzano) | Tuzhilin, Alexander (New York University)
Context-aware recommender systems (CARS) generate more relevant recommendations by adapting them to the specific contextual situation of the user. This article explores how contextual information can be used to create more intelligent and useful recommender systems. It provides an overview of the multifaceted notion of context, discusses several approaches for incorporating contextual information in recommendation process, and illustrates the usage of such approaches in several application areas where different types of contexts are exploited. The article concludes by discussing the challenges and future research directions for context-aware recommender systems.
First Order Decision Diagrams for Relational MDPs
Wang, Chenggang, Joshi, Saket, Khardon, Roni
Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational structure that can be naturally described in terms of objects and relations among them. Two contributions are presented. First, the paper develops First Order Decision Diagrams (FODD), a new compact representation for functions over relational structures, together with a set of operators to combine FODDs, and novel reduction techniques to keep the representation small. Second, the paper shows how FODDs can be used to develop solutions for RMDPs, where reasoning is performed at the abstract level and the resulting optimal policy is independent of domain size (number of objects) or instantiation. In particular, a variant of the value iteration algorithm is developed by using special operations over FODDs, and the algorithm is shown to converge to the optimal policy.
Communication-Based Decomposition Mechanisms for Decentralized MDPs
Goldman, Claudia V., Zilberstein, Shlomo
Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions.
Optimal and Approximate Q-value Functions for Decentralized POMDPs
Oliehoek, Frans A., Spaan, Matthijs T. J., Vlassis, Nikos
Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.
Bayesian Optimization for Adaptive MCMC
Mahendran, Nimalan, Wang, Ziyu, Hamze, Firas, de Freitas, Nando
This paper proposes a new randomized strategy for adaptive MCMC using Bayesian optimization. This approach applies to non-differentiable objective functions and trades off exploration and exploitation to reduce the number of potentially costly objective function evaluations. We demonstrate the strategy in the complex setting of sampling from constrained, discrete and densely connected probabilistic graphical models where, for each variation of the problem, one needs to adjust the parameters of the proposal mechanism automatically to ensure efficient mixing of the Markov chains.
Topological Value Iteration Algorithms
Dai, P., null, Mausam, Weld, D. S., Goldsmith, J.
Value iteration is a powerful yet inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, ILAO* and variants of RTDP are state-of-the-art ones. These methods use reachability analysis and heuristic search to avoid some unnecessary backups. However, none of these approaches build the graphical structure of the state transitions in a pre-processing step or use the structural information to systematically decompose a problem, whereby generating an intelligent backup sequence of the state space. In this paper, we present two optimal MDP algorithms. The first algorithm, topological value iteration (TVI), detects the structure of MDPs and backs up states based on topological sequences. It (1) divides an MDP into strongly-connected components (SCCs), and (2) solves these components sequentially. TVI outperforms VI and other state-of-the-art algorithms vastly when an MDP has multiple, close-to-equal-sized SCCs. The second algorithm, focused topological value iteration (FTVI), is an extension of TVI. FTVI restricts its attention to connected components that are relevant for solving the MDP. Specifically, it uses a small amount of heuristic search to eliminate provably sub-optimal actions; this pruning allows FTVI to find smaller connected components, thus running faster. We demonstrate that FTVI outperforms TVI by an order of magnitude, averaged across several domains. Surprisingly, FTVI also significantly outperforms popular `heuristically-informed' MDP algorithms such as ILAO*, LRTDP, BRTDP and Bayesian-RTDP in many domains, sometimes by as much as two orders of magnitude. Finally, we characterize the type of domains where FTVI excels --- suggesting a way to an informed choice of solver.
Resource Allocation Among Agents with MDP-Induced Preferences
Allocating scarce resources among agents to maximize global utility is, in general, computationally challenging. We focus on problems where resources enable agents to execute actions in stochastic environments, modeled as Markov decision processes (MDPs), such that the value of a resource bundle is defined as the expected value of the optimal MDP policy realizable given these resources. We present an algorithm that simultaneously solves the resource-allocation and the policy-optimization problems. This allows us to avoid explicitly representing utilities over exponentially many resource bundles, leading to drastic (often exponential) reductions in computational complexity. We then use this algorithm in the context of self-interested agents to design a combinatorial auction for allocating resources. We empirically demonstrate the effectiveness of our approach by showing that it can, in minutes, optimally solve problems for which a straightforward combinatorial resource-allocation technique would require the agents to enumerate up to 2^100 resource bundles and the auctioneer to solve an NP-complete problem with an input of that size.
An Algebraic Graphical Model for Decision with Uncertainties, Feasibilities, and Utilities
Pralet, C., Schiex, T., Verfaillie, G.
Numerous formalisms and dedicated algorithms have been designed in the last decades to model and solve decision making problems. Some formalisms, such as constraint networks, can express "simple" decision problems, while others are designed to take into account uncertainties, unfeasible decisions, and utilities. Even in a single formalism, several variants are often proposed to model different types of uncertainty (probability, possibility...) or utility (additive or not). In this article, we introduce an algebraic graphical model that encompasses a large number of such formalisms: (1) we first adapt previous structures from Friedman, Chu and Halpern for representing uncertainty, utility, and expected utility in order to deal with generic forms of sequential decision making; (2) on these structures, we then introduce composite graphical models that express information via variables linked by "local" functions, thanks to conditional independence; (3) on these graphical models, we finally define a simple class of queries which can represent various scenarios in terms of observabilities and controllabilities. A natural decision-tree semantics for such queries is completed by an equivalent operational semantics, which induces generic algorithms. The proposed framework, called the Plausibility-Feasibility-Utility (PFU) framework, not only provides a better understanding of the links between existing formalisms, but it also covers yet unpublished frameworks (such as possibilistic influence diagrams) and unifies formalisms such as quantified boolean formulas and influence diagrams. Our backtrack and variable elimination generic algorithms are a first step towards unified algorithms.
Instant Replay: Investigating statistical Analysis in Sports
Technology has had an unquestionable impact on the way people watch sports. Along with this technological evolution has come a higher standard to ensure a good viewing experience for the casual sports fan. It can be argued that the pervasion of statistical analysis in sports serves to satiate the fan's desire for detailed sports statistics. The goal of statistical analysis in sports is a simple one: to eliminate subjective analysis. In this paper, we review previous work that attempts to analyze various aspects in sports by using ideas from Markov Chains, Bayesian Inference and Markov Chain Monte Carlo (MCMC) methods. The unifying goal of these works is to achieve an accurate representation of the player's ability, the sport, or the environmental effects on the player's performance. With the prevalence of cheap computation, it is possible that using techniques in Artificial Intelligence could improve the result of statistical analysis in sport. This is best illustrated when evaluating football using Neuro Dynamic Programming, a Control Theory paradigm heavily based on theory in Stochastic processes. The results from this method suggest that statistical analysis in sports may benefit from using ideas from the area of Control Theory or Machine Learning