Markov Models
Probabilistic Planning for Decentralized Multi-Robot Systems
Amato, Christopher (University of New Hampshire) | Konidaris, George (Duke University) | Omidshafiei, Shayegan (Massachusetts Institute of Technology) | Agha-mohammadi, Ali-akbar (Qualcomm Research) | How, Jonathan P. (Massachusetts Institute of Technology) | Kaelbling, Leslie P. (Massachusetts Institute of Technology)
Multi-robot systems are an exciting application domain for AI research and Dec-POMDPs, specifically. MacDec-POMDP methods can produce high-quality general solutions for realistic heterogeneous multi-robot coordination problems by automatically generating control and communication policies, given a model. In contrast to most existing multi-robot methods that are specialized to a particular problem class, our approach can synthesize policies that exploit any opportunities for coordination that are present in the problem, while balancing uncertainty, sensor information, and information about other agents.
Complexity of Self-Preserving, Team-Based Competition in Partially Observable Stochastic Games
Allen, Marty (University of Wisconsin-La Crosse)
Partially observable stochastic games (POSGs) are a robust and precise model for decentralized decision making under conditions of imperfect information, and extend popular Markov decision problem models. Complexity results for a wide range of such problems are known when agents work cooperatively to pursue common interests. When agents compete, things are less well understood. We show that under one understanding of rational competition, such problems are complete for the class NEXP^NP. This result holds for any such problem comprised of two competing teams of agents, where teams may be of any size whatsoever.
Planning Under Uncertainty with Weighted State Scenarios
Walraven, Erwin (Delft University of Technology) | Spaan, Matthijs T. J. (Delft University of Technology)
External factors are hard to model using a Markovian state in several real-world planning domains. Although planning can be difficult in such domains, it may be possible to exploit long-term dependencies between states of the environment during planning. We introduce weighted state scenarios to model long-term sequences of states, and we use a model based on a Partially Observable Markov Decision Process to reason about scenarios during planning. Experiments show that our model outperforms other methods for decision making in two real-world domains.
Robotic Social Feedback for Object Specification
Wu, Emily (Brown University) | Han, Yuxin (Rhode Island School of Design) | Whitney, David (Brown University) | Oberlin, John (Brown University) | MacGlashan, James (Brown University) | Tellex, Stefanie (Brown University)
Issuing and following instructions is a common task in many forms of both human-human and human-robot collaboration. With two human participants, the accuracy of instruction following increases if the collaborators can monitor the state of their partners and respond to them through conversation (Clark and Krych 2004), a process we call social feedback. Despite this benefit in human-human interaction, current human-robot collaboration systems process instructions in non-incremental batches, which can achieve good accuracy but does not allow for reactive feedback (Tellex et al. 2011; Matuszek et al. 2012; Tellex et al. 2012; Misra et al.2014). In this paper, we show that giving a robot the ability to ask the user questions results in responsive conversations and allows the robot to quickly determine the object that the user desires. This social feedback loop between person and robot allows a person to create an internal model for the robotโs mental state and adapt their own behavior to better inform the robot. To close the human-robot feedback loop, we employ a Partially Observable Markov Decision Process (POMDP) to produce a policy which will lead to the determination of the object in the shortest amount of time. To test our approach, we perform user studies to measure our robotโs ability to deliver common household items requested by the participant. We compare delivery speed and accuracy both with and without social feedback.
Temporal and Object Relations in Unsupervised Plan and Activity Recognition
Freedman, Richard G. (University of Massachusetts Amherst) | Jung, Hee-Tae (University of Massachusetts Amherst) | Zilberstein, Shlomo (University of Massachusetts Amherst)
We consider ways to improve the performance of unsupervised plan and activity recognition techniques by considering temporal and object relations in addition to postural data. Temporal relationships can help recognize activities with cyclic structure and are often implicit because plans have degrees of ordering actions. Relations with objects can help disambiguate observed activities that otherwise share a user's posture and position. We develop and investigate graphical models that extend the popular latent Dirichlet allocation approach with temporal and object relations, examine the relative performance and runtime trade-offs using a standard dataset, and consider the cost/benefit trade-offs these extensions offer in the context of human-robot and humancomputer interaction.
Minecraft as an Experimental World for AI in Robotics
Aluru, Krishna Chaitanya (Brown University) | Tellex, Stefanie (Brown University) | Oberlin, John (Brown University) | MacGlashan, James (Brown University)
Performing experimental research on robotic platforms involves numerous practical complications, while studying collaborative interactions and efficiently collecting data from humans benefit from real time response. Roboticists can circumvent some complications by using simulators like Gazebo to test algorithms and building games like the Mars Escape game to collect data. Making use of existing resources for simulation and game creation requires the development of assets and algorithms along with the recruitment and training of users. We have created a Minecraft mod called BurlapCraft which enables the use of the reinforcement learning and planning library BURLAP to model and solve different tasks within Minecraft. BurlapCraft makes AI-HRI development easier in three core ways: the underlying Minecraft environment makes the construction of experiments simple for the developer and so allows the rapid prototyping of experimental setup; BURLAP contributes a wide variety of extensible algorithms for learning and planning, allowing easy iteration and development of task models and algorithms; and the familiarity and ubiquity of Minecraft trivializes the recruitment and training of users. To validate BurlapCraft as a platform for AI development, we demonstrate the execution of A*, BFS, RMax, language understanding, and learning language groundings from user demonstrations in five Minecraft "dungeons."
Automatic Real-Time Music Generation for Games
Engels, Steve (University of Toronto) | Tong, Tiffany (University of Toronto) | Chan, Fabian (University of Toronto)
Music composition can be a challenge for many small- to medium-sized game companies, largely due to the expense and difficulty in creating original music for each level of a game. To address this, we developed a tool that automatically generates original music, by training a music generator on pieces whose style the game designer wishes to imitate. The generator then creates original music in that style in real-time, and switches between styles when signaled by the game. This software has been refined to produce music that is coherent and imitates a composerโs larger music structure.
A Hierarchical MdMC Approach to 2D Video Game Map Generation
Snodgrass, Sam (Drexel University) | Ontanon, Santiago (Drexel University)
In this paper we describe a hierarchical method for procedurally generating 2D game maps using multi-dimensional Markov chains (MdMCs). Our method takes a collection of 2D game maps, breaks them into small chunks and performs clustering to find a set of chunks that correspond to high-level structures (high-level tiles) in the training maps. This set of high-level tiles is then used to re-represent the training maps, and to fit two sets of MdMC models: a high-level model captures the distribution of high-level tiles in the map, and a set of low-level models capture the internal structure of each high-level tile. These two sets of models can then be used to hierarchically generate new maps. We test our approach using two classic games, Super Mario Bros. and Loderunner, and compare the results against other existing map generators.
Tuning Belief Revision for Coordination with Inconsistent Teammates
Sarratt, Trevor (University of California Santa Cruz) | Jhala, Arnav (University of California Santa Cruz)
Coordination with an unknown human teammate is a notable challenge for cooperative agents. Behavior of human players in games with cooperating AI agents is often sub-optimal and inconsistent leading to choreographed and limited cooperative scenarios in games. This paper considers the difficulty of cooperating with a teammate whose goal and corresponding behavior change periodically. Previous work uses Bayesian models for updating beliefs about cooperating agents based on observations. We describe belief models for on-line planning, discuss tuning in the presence of noisy observations, and demonstrate empirically its effectiveness in coordinating with inconsistent agents in a simple domain. Further work in this area promises to lead to techniques for more interesting cooperative AI in games.
A Complete Recipe for Stochastic Gradient MCMC
Ma, Yi-An, Chen, Tianqi, Fox, Emily B.
Many recent Markov chain Monte Carlo (MCMC) samplers leverage continuous dynamics to define a transition kernel that efficiently explores a target distribution. In tandem, a focus has been on devising scalable variants that subsample the data and use stochastic gradients in place of full-data gradients in the dynamic simulations. However, such stochastic gradient MCMC samplers have lagged behind their full-data counterparts in terms of the complexity of dynamics considered since proving convergence in the presence of the stochastic gradient noise is non-trivial. Even with simple dynamics, significant physical intuition is often required to modify the dynamical system to account for the stochastic gradient noise. In this paper, we provide a general recipe for constructing MCMC samplers--including stochastic gradient versions--based on continuous Markov processes specified via two matrices. We constructively prove that the framework is complete. That is, any continuous Markov process that provides samples from the target distribution can be written in our framework. We show how previous continuous-dynamic samplers can be trivially "reinvented" in our framework, avoiding the complicated sampler-specific proofs. We likewise use our recipe to straightforwardly propose a new state-adaptive sampler: stochastic gradient Riemann Hamiltonian Monte Carlo (SGRHMC). Our experiments on simulated data and a streaming Wikipedia analysis demonstrate that the proposed SGRHMC sampler inherits the benefits of Riemann HMC, with the scalability of stochastic gradient methods.