Lee, Dennis (University of California, Berkeley) | Tang, Haoran (University of California, Berkeley) | Zhang, Jeffrey O. (University of California, Berkeley) | Xu, Huazhe (University of California, Berkeley) | Darrell, Trevor (University of California, Berkeley) | Abbeel, Pieter (University of California, Berkeley)
We present a novel modular architecture for StarCraft II AI. The architecture splits responsibilities between multiple modules that each control one aspect of the game, such as build-order selection or tactics. A centralized scheduler reviews macros suggested by all modules and decides their order of execution. An updater keeps track of environment changes and instantiates macros into series of executable actions. Modules in this framework can be optimized independently or jointly via human design, planning, or reinforcement learning. We present the first result of applying deep reinforcement learning techniques to training two out of six modules of a modular agent with self-play, achieving 92% or 86% win rates against the "Harder" (level 5) built-in Blizzard bot in Zerg vs. Zerg matches, with or without fog-of-war.
Intelligent autonomous agents that are acting in dynamic environmentsin real-time are often required to follow long-termstrategies while also remaining reactive and being able to actdeliberately. In order to create intelligent behaviors for videogame characters, there are two common approaches – plannersare used for long-term strategical planning, whereas BehaviorTrees allow for reactive acting. Although both methodologieshave their advantages, when used on their own, theyfail to fully achieve both requirements described above. Inthis work, we propose a hybrid approach combining a HierarchicalTask Network planner for high-level planning whiledelegating low-level decision making and acting to BehaviorTrees. Furthermore, we compare this approach with a pureplanner in a multi-agent environment.
Mixed-initiative PCG systems provide a way to leverage the expressive power of algorithmic techniques for content generation in a manner that lowers the technical barrier for content creators. While these tools are a proof of concept of how PCG systems can aide aspiring designers reach their vision, there are issues pertaining capturing designer intent, and interface complexity. In this paper we introduce CADI (Conversational Assistive Design Interface) a mixed initiative PCG system for creating variations of the game Pong that utilizes natural language input through a natural language interface to explore the design space of Pong variations. We provide a motivation for the creation of CADI and discuss the implementation and design decisions taken to address the issues of designer intent and interface complexity in mixed-initiative PCG systems.
Automatic game design is an increasingly popular area of research that consists of devising systems that create content or complete games autonomously. The interest in such systems is two-fold: games can be highly stochastic environments that allow presenting this task as a complex optimization problem and automatic play-testing, becoming benchmarks to advance the state of the art on AI methods. In this paper, we propose a general approach that employs the N-Tuple Bandit Evolutionary Algorithm (NTBEA) to tune parameters of three different games of the General Video Game AI (GVGAI) framework. The objective is to adjust the game experience of the players so the distribution of score events through the game approximates certain pre-defined target curves. We report satisfactory results for different target score trends and games, paving the path for future research in the area of automatically tuning player experience.
In turn-based multi-action adversarial games each player turn consists of several atomic actions, resulting in an extremely high branching factor. Many strategy board, card, and video games fall into this category, which is currently best played by Evolutionary MCTS (EMCTS) -- searching a tree with nodes representing action sequences as genomes, and edges representing mutations of those genomes. However, regular EMCTS is unable to search beyond the current player's turn, leading to strategic short-sightedness. In this paper, we extend EMCTS to search to any given search depth beyond the current turn, using simple models of its own and the opponent's behavior. Experiments on the game Hero Academy show that this Flexible-Horizon EMCTS (FH-EMCTS) convincingly outperforms several baselines including regular EMCTS, Online Evolutionary Planning (OEP), and vanilla MCTS, at all tested numbers of atomic actions per turn. Additionally, the separate contributions of the behavior models and the flexible search horizon are analyzed.
We present a suite of techniques for extending the Partially Observable Monte Carlo Planning algorithm to handle complex multi-agent games. We design the planning algorithm to exploit the inherent structure of the game. When game rules naturally cluster the actions into sets called types, these can be leveraged to extract characteristics and high-level strategies from a sparse corpus of human play. Another key insight is to account for action legality both when extracting policies from game play and when these are used to inform the forward sampling method. We evaluate our algorithm against other baselines and versus ablated versions of itself in the well-known board game Settlers of Catan.
We introduce a domain specific language for procedural content generation (PCG) called Grammatical Item Generation Language (GIGL). GIGL supports a compact representation of PCG with stochastic grammars where generated objects maintain grammatical structures. Advanced features in GIGL allow flexible customizations of the stochastic generation process. GIGL is designed and implemented to have direct interface with C++, in order to be capable of integration into production games. We showcase the expressiveness and flexibility of GIGL on several representative problem domains in grammatical PCG, and show that the GIGL-based implementations run as fast as comparable C++ implementation and with less code.
Churchill and Buro (2013) launched a line of research through Portfolio Greedy Search (PGS), an algorithm for adversarial real-time planning that uses scripts to simplify the problem's action space. In this paper we present a problem in PGS's search scheme that has hitherto been overlooked. Namely, even under the strong assumption that PGS is able to evaluate all actions available to the player, PGS might fail to return the best action. We then describe an idealized algorithm that is guaranteed to return the best action and present an approximation of such algorithm, which we call Nested-Greedy Search (NGS). Empirical results on MicroRTS show that NGS is able to outperform PGS as well as state-of-the-art methods in matches played in small to medium-sized maps.
Theatrical improvisation (impro or improv) is a demanding form of live, collaborative performance. Improv is a humorous and playful artform built on an open-ended narrative structure which simultaneously celebrates effort and failure. It is thus an ideal test bed for the development and deployment of interactive artificial intelligence (AI)-based conversational agents, or artificial improvisors. This case study introduces an improv show experiment featuring human actors and artificial improvisors. We have previously developed a deep-learning-based artificial improvisor, trained on movie subtitles, that can generate plausible, context-based, lines of dialogue suitable for theatre. In this work, we have employed it to control what a subset of human actors say during an improv performance. We also give human-generated lines to a different subset of performers. All lines are provided to actors with headphones and all performers are wearing headphones. This paper describes a Turing test, or imitation game, taking place in a theatre, with both the audience members and the performers left to guess who is a human and who is a machine. In order to test scientific hypotheses about the perception of humans versus machines we collect anonymous feedback from volunteer performers and audience members. Our results suggest that rehearsal increases proficiency and possibility to control events in the performance. That said, consistency with real world experience is limited by the interface and the mechanisms used to perform the show. We also show that human-generated lines are shorter, more positive, and have less difficult words with more grammar and spelling mistakes than the artificial improvisor generated lines.
We propose a new temporal extension of the logic of Here-and-There (HT) and its equilibriaobtained by combining it with dynamic logic over (linear) traces. Unlike previous temporal extensions of HT based on linear temporal logic, the dynamic logic features allow us to reason about the composition of actions. For instance, this can be used to exercise fine grained control when planning in robotics, as exemplified by GOLOG. In this paper,we lay the foundations of our approach, and refer to it as "Linear Dynamic Equilibrium Logic", or simply DEL. We start by developing the formal framework of DEL and provide relevant characteristic results. Among them, we elaborate upon the relationships to traditional linear dynamic logic and previous temporal extensions of HT.