Markov Models
Wu
Issuing and following instructions is a common task in many forms of both human-human and human-robot collaboration. With two human participants, the accuracy of instruction following increases if the collaborators can monitor the state of their partners and respond to them through conversation (Clark and Krych 2004), a process we call social feedback. Despite this benefit in human-human interaction, current human-robot collaboration systems process instructions in non-incremental batches, which can achieve good accuracy but does not allow for reactive feedback (Tellex et al. 2011; Matuszek et al. 2012; Tellex et al. 2012; Misra et al.2014). In this paper, we show that giving a robot the ability to ask the user questions results in responsive conversations and allows the robot to quickly determine the object that the user desires. This social feedback loop between person and robot allows a person to create an internal model for the robot's mental state and adapt their own behavior to better inform the robot. To close the human-robot feedback loop, we employ a Partially Observable Markov Decision Process (POMDP) to produce a policy which will lead to the determination of the object in the shortest amount of time. To test our approach, we perform user studies to measure our robot's ability to deliver common household items requested by the participant. We compare delivery speed and accuracy both with and without social feedback.
Curran
Although we would like our robots to have completely autonomous behavior, this is often not possible. Some parts of a task might be hard to automate, perhaps due to hard-to-interpret sensor information, or a complex environment. In this case, using shared autonomy or teleoperation is preferable to an error-prone autonomous approach. However, the question of which parts of a task to allocate to the human, and which to the robot can often be tricky. In this work, we introduce A3P, a risk-aware task-level reinforcement learning algorithm. A3P represents a task-level state machine as a POMDP. In this paper, we introduce A3P, a risk-aware algorithm that discovers when to hand off subtasks to a human assistant. A3P models the task as a Partially Observably Markov Decision Process (POMDP) and explicitly represents failures as additional state-action pairs. Based on the model, the algorithm allows the user to allocate subtasks the robot or the human in such a way as to manage the worst-case performance time for the overall task.
Agarwal
We study the relationship between performance and practice by analyzing the activity of many players of a casual online game. We find significant heterogeneity in the improvement of player performance, given by score, and address this by dividing players into similar skill levels and segmenting each player's activity into sessions, that is, sequence of game rounds without an extended break. After disaggregating data, we find that performance improves with practice across all skill levels. More interestingly, players are more likely to end their session after an especially large improvement, leading to a peak score in their very last game of a session. In addition, success is strongly correlated with a lower quitting rate when the score drops, and only weakly correlated with skill, in line with psychological findings about the value of persistence and "grit:" successful players are those who persist in their practice despite lower scores. Finally, we train an ฮต-machine, a type of hidden Markov model, and find a plausible mechanism of game play that can predict player performance and quitting the game. Our work raises the possibility of real-time assessment and behavior prediction that can be used to optimize human performance.
Ha
Goal recognition is the task of inferring users' goals from sequences of observed actions. By enabling player-adaptive digital games to dynamically adjust their behavior in concert with players' changing goals, goal recognition can inform adaptive decision making for a broad range of entertainment, training, and education applications. This paper presents a goal recognition framework based on Markov logic networks (MLN). The model's parameters are directly learned from a corpus of actions that was collected through player interactions with a non-linear educational game. An empirical evaluation demonstrates that the MLN goal recognition framework accurately predicts players' goals in a game environment with multiple solution paths.
Dereszynski
We study the problem of learning probabilistic models of high-level strategic behavior in the real-time strategy (RTS) game StarCraft. The models are automatically learned from sets of game logs and aim to capture the common strategic states and decision points that arise in those games. Unlike most work on behavior/strategy learning and prediction in RTS games, our data-centric approach is not biased by or limited to any set of preconceived strategic concepts. Further, since our behavior model is based on the well-developed and generic paradigm of hidden Markov models, it supports a variety of uses for the design of AI players and human assistants. For example, the learned models can be used to make probabilistic predictions of a player's future actions based on observations, to simulate possible future trajectories of a player, or to identify uncharacteristic or novel strategies in a game database. In addition, the learned qualitative structure of the model can be analyzed by humans in order to categorize common strategic elements. We demonstrate our approach by learning models from 331 expert-level games and provide both a qualitative and quantitative assessment of the learned model's utility.
Groves
Hidden Markov Models have been used frequently in the audio domain to identify underlying musical structure. Much less work has been done in the purely symbolic realm. Recently, a substantial amount of expert-labelled symbolic musical data has been injected into the research community. The new availability of data allows for the application of machine learning models to purely symbolic tasks. Similarly, the continued expansion of the field of machine learning provides new perspectives and implementations of machine learning methods, which are powerful tools when approaching complex musical challenges. This research explores the use of an extended probabilistic model such as the Hidden Semi-Markov Model (HSMM) to approach the task of automatic harmonization. One distinct advantage of the HSMM is that it is able to automatically differentiate harmonic boundaries, through its inclusion of an extra parameter: duration. In this way, a melody can be harmonized automatically in the style of a particular corpus. In the case of this research, the corpus was in the style of Rock'n' Roll.
Rowe
A key functionality provided by interactive narrative systems is narrative adaptation: tailoring story experiences in response to users' actions and needs. We present a data-driven framework for dynamically tailoring events in interactive narratives using modular reinforcement learning. The framework involves decomposing an interactive narrative into multiple concurrent sub-problems, formalized as adaptable event sequences (AESs). Each AES is modeled as an independent Markov decision process (MDP). Policies for each MDP are induced using a corpus of user interaction data from an interactive narrative system with exploratory narrative adaptation policies. Rewards are computed based on users' experiential outcomes. Conflicts between multiple policies are handled using arbitration procedures. In addition to introducing the framework, we describe a corpus of user interaction data from a testbed interactive narrative, CRYSTAL ISLAND, for inducing narrative adaptation policies. Empirical findings suggest that the framework can effectively shape users' interactive narrative experiences.
Baikadi
The problem of goal recognition, and its more general form plan recognition, has been the subject of extensive investigation in the AI community. However, there have been relatively few empirical investigations of goal recognition models in the intelligent narrative technologies community to date, and little is known about how computational models of interactive narrative can inform goal recognition. In this paper, we investigate a novel goal recognition model based on Markov Logic Networks (MLNs) that leverages narrative discovery events to enrich its representation of narrative state. An empirical evaluation shows that the enriched model outperforms a prior state-of-the-art MLN model in terms of accuracy, convergence rate, and the point of convergence.
Etheredge
Player classification allows for considerable improvements on both game analytics and game adaptivity. With this paper we aim at reversing the ad-hoc tendency in player classification methods, by proposing an approach to player classification that can be integrated across different games and genres and is particularly suited to be used by game designers. This paper describes our generic method of interaction-based player classification, which consists of three components: (i) intercepting player interactions, (ii) finding player types through fuzzy cluster analysis and (iii) classification using Hidden Markov Models (HMM). To showcase our method we developed Blindmaze, a simple web-based hidden maze game publicly available, featuring a bounded set of interactions. All data collected from a game is interaction-based, requiring minimal implementation effort from the game developers. It is concluded that our method makes player classification even more available by making it generic and re-usable across different games.
Min
While many open-ended digital games feature non-linear storylines and multiple solution paths, it is challenging for game developers to create effective game experiences in these settings due to the freedom given to the player. To address these challenges, goal recognition, a computational player-modeling task, has been investigated to enable digital games to dynamically predict players' goals. This paper presents a goal recognition framework based on stacked denoising autoencoders, a variant of deep learning. The learned goal recognition models, which are trained from a corpus of player interactions, not only offer improved performance, but also offer the substantial advantage of eliminating the need for labor-intensive feature engineering. An evaluation demonstrates that the deep learning-based goal recognition framework significantly outperforms the previous state-of-the-art goal recognition approach based on Markov logic networks.