Goto

Collaborating Authors

 Undirected Networks


The MADP Toolbox: An Open-Source Library for Planning and Learning in (Multi-)Agent Systems

AAAI Conferences

This article describes the MultiAgent Decision Process (MADP) toolbox, a software library to support planning and learning for intelligent agents and multiagent systems in uncertain environments. Some of its key features are that it supports partially observable environments and stochastic transition models; has unified support for single- and multiagent systems; provides a large number of models for decision-theoretic decision making, including one-shot decision making (e.g., Bayesian games) and sequential decision making under various assumptions of observability and cooperation, such as Dec-POMDPs and POSGs; provides tools and parsers to quickly prototype new problems; provides an extensive range of planning and learning algorithms for single-and multiagent systems; and is written in C++ and designed to be extensible via the object-oriented paradigm.


MDPVIS: An Interactive Visualization for Testing Markov Decision Processes

AAAI Conferences

Whereas computational steering traditionally A common approach for solving Markov Decision Processes refers to modifying a computer process during its execution is to implement a simulator of the stochastic dynamics of (Mulder, van Wijk, and van Liere 1999), we treat optimization the MDP and a Monte Carlo optimization algorithm that invokes as an open-ended process whose parameters are repeatedly this simulator. The resulting software system is often changed for testing and debugging.


Deep Recurrent Q-Learning for Partially Observable MDPs

AAAI Conferences

Deep Reinforcement Learning has yielded proficient controllers for complex tasks. However, these controllers have limited memory and rely on being able to perceive the complete game screen at each decision point. To address these shortcomings, this article investigates the effects of adding recurrency to a Deep Q-Network (DQN) by replacing the first post-convolutional fully-connected layer with a recurrent LSTM. The resulting Deep Recurrent Q-Network (DRQN), although capable of seeing only a single frame at each timestep, successfully integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens. Additionally, when trained with partial observations and evaluated with incrementally more complete observations, DRQN's performance scales as a function of observability. Conversely, when trained with full observations and evaluated with partial observations, DRQN's performance degrades less than DQN's. Thus, given the same length of history, recurrency is a viable alternative to stacking a history of frames in the DQN's input layer and while recurrency confers no systematic advantage when learning to play the game, the recurrent net can better adapt at evaluation time if the quality of observations changes.


Commitment Semantics for Sequential Decision Making Under Reward Uncertainty

AAAI Conferences

A commitment represents an agent's intention to attempt to bring about some state of the world that is desired by some agent (possibly itself) in the future. Thus, by making a commitment, an agent is agreeing to make sequential decisions that it believes can cause the desired state to arise. In general, though, an agent's actions will have uncertain outcomes, and thus reaching the desired state cannot be guaranteed. For such sequential decision settings with uncertainty, therefore, commitments can only be probabilistic. We argue that standard notions of commitment are insufficient for probabilistic commitments, and propose a new semantics that judges commitment fulfillment not in terms of whether the agent achieved the desired state, but rather in terms of whether the agent made sequential decisions that in expectation would have achieved the desired state with (at least) the promised probability. We have devised various algorithms that operationalize our semantics, to capture problem contexts with probabilistic commitments arising because action outcomes are uncertain, as well as arising because an agent might realize over time that it does not want to fulfill the commitment.


Probabilistic Planning for Decentralized Multi-Robot Systems

AAAI Conferences

Multi-robot systems are an exciting application domain for AI research and Dec-POMDPs, specifically. MacDec-POMDP methods can produce high-quality general solutions for realistic heterogeneous multi-robot coordination problems by automatically generating control and communication policies, given a model. In contrast to most existing multi-robot methods that are specialized to a particular problem class, our approach can synthesize policies that exploit any opportunities for coordination that are present in the problem, while balancing uncertainty, sensor information, and information about other agents.


Complexity of Self-Preserving, Team-Based Competition in Partially Observable Stochastic Games

AAAI Conferences

Partially observable stochastic games (POSGs) are a robust and precise model for decentralized decision making under conditions of imperfect information, and extend popular Markov decision problem models. Complexity results for a wide range of such problems are known when agents work cooperatively to pursue common interests. When agents compete, things are less well understood. We show that under one understanding of rational competition, such problems are complete for the class NEXP^NP. This result holds for any such problem comprised of two competing teams of agents, where teams may be of any size whatsoever.


Planning Under Uncertainty with Weighted State Scenarios

AAAI Conferences

External factors are hard to model using a Markovian state in several real-world planning domains. Although planning can be difficult in such domains, it may be possible to exploit long-term dependencies between states of the environment during planning. We introduce weighted state scenarios to model long-term sequences of states, and we use a model based on a Partially Observable Markov Decision Process to reason about scenarios during planning. Experiments show that our model outperforms other methods for decision making in two real-world domains.


Robotic Social Feedback for Object Specification

AAAI Conferences

Issuing and following instructions is a common task in many forms of both human-human and human-robot collaboration. With two human participants, the accuracy of instruction following increases if the collaborators can monitor the state of their partners and respond to them through conversation (Clark and Krych 2004), a process we call social feedback. Despite this benefit in human-human interaction, current human-robot collaboration systems process instructions in non-incremental batches, which can achieve good accuracy but does not allow for reactive feedback (Tellex et al. 2011; Matuszek et al. 2012; Tellex et al. 2012; Misra et al.2014). In this paper, we show that giving a robot the ability to ask the user questions results in responsive conversations and allows the robot to quickly determine the object that the user desires. This social feedback loop between person and robot allows a person to create an internal model for the robot’s mental state and adapt their own behavior to better inform the robot. To close the human-robot feedback loop, we employ a Partially Observable Markov Decision Process (POMDP) to produce a policy which will lead to the determination of the object in the shortest amount of time. To test our approach, we perform user studies to measure our robot’s ability to deliver common household items requested by the participant. We compare delivery speed and accuracy both with and without social feedback.


Temporal and Object Relations in Unsupervised Plan and Activity Recognition

AAAI Conferences

We consider ways to improve the performance of unsupervised plan and activity recognition techniques by considering temporal and object relations in addition to postural data. Temporal relationships can help recognize activities with cyclic structure and are often implicit because plans have degrees of ordering actions. Relations with objects can help disambiguate observed activities that otherwise share a user's posture and position. We develop and investigate graphical models that extend the popular latent Dirichlet allocation approach with temporal and object relations, examine the relative performance and runtime trade-offs using a standard dataset, and consider the cost/benefit trade-offs these extensions offer in the context of human-robot and humancomputer interaction.


Minecraft as an Experimental World for AI in Robotics

AAAI Conferences

Performing experimental research on robotic platforms involves numerous practical complications, while studying collaborative interactions and efficiently collecting data from humans benefit from real time response. Roboticists can circumvent some complications by using simulators like Gazebo to test algorithms and building games like the Mars Escape game to collect data. Making use of existing resources for simulation and game creation requires the development of assets and algorithms along with the recruitment and training of users. We have created a Minecraft mod called BurlapCraft which enables the use of the reinforcement learning and planning library BURLAP to model and solve different tasks within Minecraft. BurlapCraft makes AI-HRI development easier in three core ways: the underlying Minecraft environment makes the construction of experiments simple for the developer and so allows the rapid prototyping of experimental setup; BURLAP contributes a wide variety of extensible algorithms for learning and planning, allowing easy iteration and development of task models and algorithms; and the familiarity and ubiquity of Minecraft trivializes the recruitment and training of users. To validate BurlapCraft as a platform for AI development, we demonstrate the execution of A*, BFS, RMax, language understanding, and learning language groundings from user demonstrations in five Minecraft "dungeons."