Gmytrasiewicz, Piotr


Interactive Agent that Understands the User

AAAI Conferences

Our work uses the notion of theory of mind to enable an interactive agent to keep track of the state of knowledge, goals and intentions of the human user, and to engage in and initiate sophisticated interactive behaviors using decision-theoretic paradigm of maximizing expected utility. Currently, systems like Google Now and Siri mostly react to user’s requests and commands using hand-crafted responses, but they cannot initiate intelligent communication and plan for longer term interactions. The reason is that they lack a clearly defined general objective of the interaction. Our main premise is that communication and interaction are types of action, so planning for communicative and interactive actions should be based on a unified framework of decisiontheoretic planning. To facilitate this, the system’s state of knowledge (a mental model) about the world has to include probabilistic representation of what is known, what is uncertain, and how things change as different events transpire. Further, the state of user’s knowledge and intentions (the theory of the user’s mind) needs to include precise specification of what the system knows, and how uncertain it is, about the user’s mental model, and about her desires and intentions. The theories of mind may be further nested to form interactive beliefs. Finally, decision-theoretic planning proposes that desirability of possible sequences of interactive and communicative actions be assessed as expected utilities of alternative plans.We describe our preliminary implementation using the Open CYC system, called MARTHA, and illustrate it in action using two simple interactive scenarios.


Learning Others' Intentional Models in Multi-Agent Settings Using Interactive POMDPs

AAAI Conferences

Interactive partially observable Markov decision processes (I-POMDPs) provide a principled framework for planning and acting in a partially observable, stochastic and multi-agent environment. It extends POMDPs to multi-agent settings by including models of other agents in the state space and forming a hierarchical belief structure. In order to predict other agents' actions using I-POMDPs, we propose an approach that effectively uses Bayesian inference and sequential Monte Carlo (SMC) sampling to learn others' intentional models which ascribe to them beliefs, preferences and rationality in action selection. Empirical results show that our algorithm accurately learns models of the other agent and has superior performance than other methods. Our approach serves as a generalized Bayesian learning algorithm that learns other agents' beliefs, and transition, observation and reward functions. It also effectively mitigates the belief space complexity due to the nested belief hierarchy.


Bayesian Learning of Other Agents' Finite Controllers for Interactive POMDPs

AAAI Conferences

We consider an autonomous agent operating in a stochastic, partially-observable, multiagent environment, that explicitly models the other agents as probabilistic deterministic finite-state controllers (PDFCs) in order to predict their actions. We assume that such models are not given to the agent, but instead must be learned from (possibly imperfect) observations of the other agents' behavior. The agent maintains a belief over the other agents' models, that is updated via Bayesian inference. To represent this belief we place a flexible stick-breaking distribution over PDFCs, that allows the posterior to concentrate around controllers whose size is not bounded and scales with the complexity of the observed data. Since this Bayesian inference task is not analytically tractable, we devise a Markov chain Monte Carlo algorithm to approximate the posterior distribution. The agent then embeds the result of this inference into its own decision making process using the interactive POMDP framework. We show that our learning algorithm can learn agent models that are behaviorally accurate for problems of varying complexity, and that the agent's performance increases as a result.


MARTHA Speaks: Implementing Theory of Mind for More Intuitive Communicative Acts

AAAI Conferences

The theory of mind is an important human capability that allows us to understand and predict the goals, intents, and beliefs of other individuals. We present an approach to designing intelligent communicative agents based on modeling theories of mind. This can be tricky because other agents may also have their own theories of mind of the first agent, meaning that these mental models are naturally nested in layers. So, to look for intuitive communicative acts, we recursively apply a planning algorithm in each of these nested layers, looking for possible plans of action as well as their hypothetical consequences, which include the reactions of other agents; we propose that truly intelligent communicative acts are the ones which produce a state of maximum decision theoretic utility according to the entire theory of mind. We implement these ideas using Java and OpenCyc in an attempt to create an assistive AI we call MARTHA. We demonstrate MARTHA's capabilities with two motivating examples: helping the user buy a sandwich and helping the user search for an activity. We see that, in addition to being a personal assistant, MARTHA can be extended to other assistive fields, such as finance, research, and government.


MARTHA Speaks: Implementing Theory of Mind for More Intuitive Communicative Acts

AAAI Conferences

The theory of mind is an important human capability that allows us to understand and predict the goals, intents, and beliefs of other individuals. We present an approach to designing intelligent communicative agents based on modeling theories of mind. This can be tricky because other agents may also have their own theories of mind of the first agent, meaning that these mental models are naturally nested in layers. So, to look for intuitive communicative acts, we recursively apply a planning algorithm in each of these nested layers, looking for possible plans of action as well as their hypothetical consequences, which include the reactions of other agents; we propose that truly intelligent communicative acts are the ones which produce a state of maximum decision theoretic utility according to the entire theory of mind. We implement these ideas using Java and OpenCyc in an attempt to create an assistive AI we call MARTHA. We demonstrate MARTHA's capabilities with two motivating examples: helping the user buy a sandwich and helping the user search for an activity. We see that, in addition to being a personal assistant, MARTHA can be extended to other assistive fields, such as finance, research, and government.


Nonparametric Bayesian Learning of Other Agents' Policies in Interactive POMDPs

AAAI Conferences

We consider an autonomous agent facing a partially observable, stochastic, multiagent environment where the unknown policies of other agents are represented as finite state controllers (FSCs). We show how an agent can (i) learn the FSCs of the other agents, and (ii) exploit these models during interactions. To separate the issues of off-line versus on-line learning we consider here an off-line two-phase approach. During the first phase the agent observes as the other player(s) are interacting with the environment (the observations may be imperfect and the learning agent is not taking part in the interaction.) The collected data is used to learn an ensemble of FSCs that explain the behavior of the other agent(s) using a Bayesian non-parametric (BNP) approach. We verify the quality of the learned models during the second phase by allowing the agent to compute its own optimal policy and interact with the observed agent. The optimal policy for the learning agent is obtained by solving an interactive POMDP in which the states are augmented by the other agent(s)' possible FSCs. The advantage of using the Bayesian nonparametric approach in the first phase is that the complexity (number of nodes) of the learned controllers is not bounded a priori. Our two-phase approach is preliminary and separates the learning using BNP from the complexities of learning on-line while the other agent may be modifying its policy (on-line approach is subject of our future work.) We describe our implementation and results in a multiagent Tiger domain. Our results show that learning improves the agent's performance, which increases with the amount of data collected during the learning phase.


Modeling Bounded Rationality of Agents During Interactions

AAAI Conferences

Frequently, it is advantageous for an agent to model other agents in order to predict their behavior during an interaction. Modeling others as rational has a long tradition in AI and game theory, but modeling other agents’ departures from rationality is difficult and controversial. This paper proposes that bounded rationality be modeled as errors the agent being modeled is making while deciding on its action. We are motivated by the work on quantal response equilibria in behavioral game theory which uses Nash equilibria as the solution concept. In contrast, we use decision-theoretic maximization of expected utility. Quantal response assumes that a decision maker is rational, i.e., is maximizing his expected utility, but only approximately so, with an error rate characterized by a single error parameter. Another agent’s error rate may be unknown and needs to be estimated during an interaction. We show that the error rate of the quantal response can be estimated using Bayesian update of a suitable conjugate prior, and that it has a finitely dimensional sufficient statistic under strong simplifying assumptions. However, if the simplifying assumptions are relaxed, the quantal response does not admit a finite sufficient statistic and a more complex update is needed. This confirms the difficulty of using simple models of bounded rationality in general settings.


Hybrid Value Iteration for POMDPs

AAAI Conferences

The Partially Observable Markov Decision Process (POMDP) provides a rich mathematical model for designing agents that have to formulate plans under uncertainty. The curses of dimensionality and history associated with solving POMDPs have lead to numerous refinements of the value iteration algorithm. Several exact methods with different pruning strategies have been devised, yet, limited scalability has lead research to focus on ways to approximate the optimal value function. One set of approximations relies on point-based value iteration, which maintains a fixed-size value function, and is typically executed offline. Another set of approximations relies on tree search, which explores the implicit tree defined by the value iteration equation, and is typically executed online. In this paper we present a hybrid value iteration algorithm that combines the benefits of point-based value iteration and tree search. Using our approach, a hybrid agent executes tree search online, and occasionally updates its offline-computed lower bound on the optimal value function, resulting in improved lookahead and higher obtained reward, while meeting real-time constraints. Thus, unlike other hybrid algorithms that use an invariant value function computed offline, our proposed scheme uses information from the real-time tree search process to reason whether to perform a point-based backup online. Keeping track of partial results obtained during online planning makes the computation of point-based backups less prohibitive. We report preliminary results that support our approach.


Reports of the AAAI 2010 Conference Workshops

AI Magazine

The AAAI-10 Workshop program was held Sunday and Monday, July 11–12, 2010 at the Westin Peachtree Plaza in Atlanta, Georgia. The AAAI-10 workshop program included 13 workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were AI and Fun, Bridging the Gap between Task and Motion Planning, Collaboratively-Built Knowledge Sources and Artificial Intelligence, Goal-Directed Autonomy, Intelligent Security, Interactive Decision Theory and Game Theory, Metacognition for Robust Social Systems, Model Checking and Artificial Intelligence, Neural-Symbolic Learning and Reasoning, Plan, Activity, and Intent Recognition, Statistical Relational AI, Visual Representations and Reasoning, and Abstraction, Reformulation, and Approximation. This article presents short summaries of those events.


Reports of the AAAI 2010 Conference Workshops

AI Magazine

The AAAI-10 Workshop program was held Sunday and Monday, July 11–12, 2010 at the Westin Peachtree Plaza in Atlanta, Georgia. The AAAI-10 workshop program included 13 workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were AI and Fun, Bridging the Gap between Task and Motion Planning, Collaboratively-Built Knowledge Sources and Artificial Intelligence, Goal-Directed Autonomy, Intelligent Security, Interactive Decision Theory and Game Theory, Metacognition for Robust Social Systems, Model Checking and Artificial Intelligence, Neural-Symbolic Learning and Reasoning, Plan, Activity, and Intent Recognition, Statistical Relational AI, Visual Representations and Reasoning, and Abstraction, Reformulation, and Approximation. This article presents short summaries of those events.