Undirected Networks
A POMDP Extension with Belief-dependent Rewards
Araya, Mauricio, Buffet, Olivier, Thomas, Vincent, Charpillet, Françcois
Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce rho-POMDPs, an extension of POMDPs where the reward function rho depends on the belief state. We show that, under the common assumption that rho is convex, the value function is also convex, what makes it possible to (1) approximate rho arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes.
A Monte Carlo AIXI Approximation
Veness, Joel, Ng, Kee Siong, Hutter, Marcus, Uther, William, Silver, David
This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. Our approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a new Monte-Carlo Tree Search algorithm along with an agent-specific extension to the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a variety of stochastic and partially observable domains. We conclude by proposing a number of directions for future research.
Reinforcement Learning in Partially Observable Markov Decision Processes using Hybrid Probabilistic Logic Programs
We present a probabilistic logic programming framework to reinforcement learning, by integrating reinforce-ment learning, in POMDP environments, with normal hybrid probabilistic logic programs with probabilistic answer set seman-tics, that is capable of representing domain-specific knowledge. We formally prove the correctness of our approach. We show that the complexity of finding a policy for a reinforcement learning problem in our approach is NP-complete. In addition, we show that any reinforcement learning problem can be encoded as a classical logic program with answer set semantics. We also show that a reinforcement learning problem can be encoded as a SAT problem. We present a new high level action description language that allows the factored representation of POMDP. Moreover, we modify the original model of POMDP so that it be able to distinguish between knowledge producing actions and actions that change the environment.
An Introduction to Conditional Random Fields
Sutton, Charles, McCallum, Andrew
Often we wish to predict a large number of variables that depend on each other as well as on other observed variables. Structured prediction methods are essentially a combination of classification and graphical modeling, combining the ability of graphical models to compactly model multivariate data with the ability of classification methods to perform prediction using large sets of input features. This tutorial describes conditional random fields, a popular probabilistic method for structured prediction. CRFs have seen wide application in natural language processing, computer vision, and bioinformatics. We describe methods for inference and parameter estimation for CRFs, including practical issues for implementing large scale CRFs. We do not assume previous knowledge of graphical modeling, so this tutorial is intended to be useful to practitioners in a wide variety of fields.
Hierarchical Multimodal Planning for Pervasive Interaction
Lin, Yong (University of Texas at Arlington) | Makedon, Fillia ( University of Texas at Arlington )
Traditional dialogue management systems are tightly coupled with the sensing ability of a single computer. How to organize an interaction in pervasive environments to provide a friendly and integrated interface to users is an important issue. This requires a transition of the human-computer interaction (HCI) from tight coupling to loose coupling. This paper proposes a hierarchical multimodal framework for pervasive interactions. Our system is designed to remind the activities of daily living for individuals with cognitive impairments.The system is composed of Markov decision processes for activity planing, and multimodal partially observable Markov decision processes for action planning and executing. Empirical results demonstrate the hierarchical multimodal framework establishes a flexible mechanism for pervasive interaction systems.
Modeling and Measuring Self-Regulated Learning in Teachable Agent Environments
Kinnebrew, John S. (Vanderbilt University) | Biswas, Gautam (Vanderbilt University) | Sulcer, William B. (Vanderbilt University)
Our learning by teaching environment has students take on the role and responsibilities of a teacher to a virtual student named Betty. The environment is structured so that successfully instructing their teachable agent requires the students to learn and understand science topics for themselves. This process is supported by adaptive scaffolding and feedback from the system. This feedback is instantiated through the interactions with the teachable agent and a mentor agent, named Mr. Davis. This paper provides an overview of two studies that were conducted with 5th grade science students and a description of the analysis techniques that we have developed for interpreting students’ activities in this learning environment.
Automata Modeling for Cognitive Interference in Users' Relevance Judgment
Zhang, Peng (The Robert Gordon University) | Song, Dawei (The Robert Gordon University) | Hou, Yuexian (Tianjin University) | Wang, Jun (Robert Gordon University) | Bruza, Peter (Queensland University of Technology)
Quantum theory has recently been employed to further advance thetheory of information retrieval (IR). A challenging research topicis to investigate the so called quantum-like interference in users'relevance judgment process, where users are involved to judge therelevance degree of each document with respect to a given query. Inthis process, users' relevance judgment for the current document isoften interfered by the judgment for previous documents, due to theinterference on users' cognitive status. Research from cognitivescience has demonstrated some initial evidence of quantum-likecognitive interference in human decision making, which underpins theuser's relevance judgment process. This motivates us to model suchcognitive interference in the relevance judgment process, which inour belief will lead to a better modeling and explanation of userbehaviors in relevance judgement process for IR and eventually leadto more user-centric IR models. In this paper, we propose to useprobabilistic automaton (PA) and quantum finite automaton (QFA),which are suitable to represent the transition of user judgmentstates, to dynamically model the cognitive interference when theuser is judging a list of documents.
Scalable POMDPs for Diagnosis and Planning in Intelligent Tutoring Systems
Folsom-Kovarik, Jeremiah T. (University of Central Florida) | Sukthankar, Gita (University of Central Florida) | Schatz, Sae (University of Central Florida) | Nicholson, Denise (University of Central Florida)
A promising application area for proactive assistant agents is automated tutoring and training. Intelligent tutoring systems (ITSs) assist tutors and tutees by automating diagnosis and adaptive tutoring. These tasks are well modeled by a partially observable Markov decision process (POMDP) since it accounts for the uncertainty inherent in diagnosis. However, an important aspect of making POMDP solvers feasible for real-world problems is selecting appropriate representations for states, actions, and observations. This paper studies two scalable POMDP state and observation representations. State queues allow POMDPs to temporarily ignore less-relevant states. Observation chains represent information in independent dimensions using sequences of observations to reduce the size of the observation set. Preliminary experiments with simulated tutees suggest the experimental representations perform as well as lossless POMDPs, and can model much larger problems.
Policy Activation for Open-Ended Dialogue Management
Lison, Pierre (German Research Centre for Artificial Intelligence (DFKI GmbH)) | Kruijff, Geert-Jan M. (German Research Centre for Artificial Intelligence (DFKI)
An important difficulty in developing spoken dialogue systems for robots is the open-ended nature of most interactions. Robotic agents must typically operate in complex, continuously changing environments which are difficult to model and do not provide any clear, predefined goal. Directly capturing this complexity in a single, large dialogue policy is thus inadequate. This paper presents a new approach which tackles the complexity of open-ended interactions by breaking it into a set of small, independent policies, which can be activated and deactivated at runtime by a dedicated mechanism. The approach is currently being implemented in a spoken dialogue system for autonomous robots.
A Model for Verbal and Non-Verbal Human-Robot Collaboration
Matignon, Laetitia (University of Caen Basse Normandie) | Karami, Abir Beatrice (University of Caen Basse Normandie) | Mouaddib, Abdel-Illah (University of Caen Basse Normandie)
We are motivated by building a system for an autonomous robot companion that collaborates with a human partner for achieving a common mission. The objective of the robot is to infer the human's preferences upon the tasks of the mission so as to collaborate with the human by achieving human's non-favorite tasks. Inspired by recent researches about the recognition of human's intention, we propose a unified model that allows the robot to switch accurately between verbal and non-verbal interactions. Our system unifies an epistemic partially observable Markov decision process (POMDP) that is a human-robot spoken dialog system aiming at disambiguating the human's preferences and an intuitive human-robot collaboration consisting in inferring human's intention based on the observed human actions. The beliefs over human's preferences computed during the dialog are then reinforced in the course of the task execution by the intuitive interaction. Our unified model helps the robot inferring the human's preferences and deciding which tasks to perform to effectively satisfy these preferences. The robot is also able to adjust its plan rapidly in case of sudden changes in the human's preferences and to switch between both kind of interactions. Experimental results on a scenario inspired from robocup@home outline various specific behaviors of the robot during the collaborative mission.