Collaborating Authors

Uber's AI plays text-based games like a human


Can AI learn to play text-based games like a human? That's the question applied scientists at Uber's AI research division set out to answer in a recent study. Their exploration and imitation-learning-based system -- which builds upon an earlier framework called Go-Explore -- taps policies to solve a game by following paths (or trajectories) with high rewards. "Text-based computer games describe their world to the player through natural language and expect the player to interact with the game using text. These games are of interest as they can be seen as a testbed for language understanding, problem-solving, and language generation by artificial agents," wrote the coauthors of a paper describing the work.

A Survey of Text Games for Reinforcement Learning informed by Natural Language Artificial Intelligence

Reinforcement Learning (RL) has shown human-level performance in solving complex, single setting virtual environments Mnih et al. [2013] & Silver et al. [2016]. However, applications and theory in RL problems have been far less developed and it has been posed that this is due to a wide divide between the empirical methodology associated with virtual environments in RL research and the challenges associated with reality Dulac-Arnold et al. [2019]. Simply put, Text Games provide a safe and data efficient way to learn from environments that mimic language found in real-world scenarios Shridhar et al. [2020]. Natural language (NL) has been introduced as a solution to many of the challenges in RL Luketina et al. [2019], as NL can facilitate the transfer of abstract knowledge to downstream tasks. However, RL approaches on these language driven environments are still limited in their development and therefore a call has been made for an improvement on the evaluation settings where language is a first-class component. Text Games gained wider acceptance as a testbed for NL research following work Figure 1: Sample gameplay from Narasimhan et al. [2015] who leveraged the Deep Q Network (DQN) framework from a fantasy Text Game as for policy learning on a set of synthetic textual games. Text Games are both partially given by Narasimhan et al. observable (as shown in Figure 1) and include outcomes that make reward signals [2015] where the player takes simple to define, making them a suitable problem for Reinforcement Learning to the action'Go East' to cross solve. However, research so far has been performed independently, with many authors the bridge.

Bootstrapped Q-learning with Context Relevant Observation Pruning to Generalize in Text-based Games Machine Learning

We show that Reinforcement Learning (RL) methods for solving Text-Based Games (TBGs) often fail to generalize on unseen games, especially in small data regimes. To address this issue, we propose Context Relevant Episodic State Truncation (CREST) for irrelevant token removal in observation text for improved generalization. Our method first trains a base model using Q-learning, which typically overfits the training games. The base model's action token distribution is used to perform observation pruning that removes irrelevant tokens. A second bootstrapped model is then retrained on the pruned observation text. Our bootstrapped agent shows improved generalization in solving unseen TextWorld games, using 10x-20x fewer training games compared to previous state-of-the-art methods despite requiring less number of training episodes.

Go-Blend behavior and affect Artificial Intelligence

This paper proposes a paradigm shift for affective computing by viewing the affect modeling task as a reinforcement learning process. According to our proposed framework the context (environment) and the actions of an agent define the common representation that interweaves behavior and affect. To realise this framework we build on recent advances in reinforcement learning and use a modified version of the Go-Explore algorithm which has showcased supreme performance in hard exploration tasks. In this initial study, we test our framework in an arcade game by training Go-Explore agents to both play optimally and attempt to mimic human demonstrations of arousal. We vary the degree of importance between optimal play and arousal imitation and create agents that can effectively display a palette of affect and behavioral patterns. Our Go-Explore implementation not only introduces a new paradigm for affect modeling; it empowers believable AI-based game testing by providing agents that can blend and express a multitude of behavioral and affective patterns.

How To Avoid Being Eaten By a Grue: Exploration Strategies for Text-Adventure Agents Artificial Intelligence

Most current reinforcement learning algorithms are not capable of effectively handling such a large number of possible actions per turn. Poor sample efficiency, consequently, results in agents that are unable to pass bottleneck states, where they are unable to proceed because they do not see the right action sequence to pass the bottleneck enough times to be sufficiently reinforced. Building on prior work using knowledge graphs in reinforcement learning, we introduce two new game state exploration strategies. We compare our exploration strategies against strong baselines on the classic text-adventure game, Zork1, where prior agent have been unable to get past a bottleneck where the agent is eaten by a Grue.