Torrado, Ruben Rodriguez
Evolving Agents for the Hanabi 2018 CIG Competition
Canaan, Rodrigo, Shen, Haotian, Torrado, Ruben Rodriguez, Togelius, Julian, Nealen, Andy, Menzel, Stefan
Abstract--Hanabi is a cooperative card game with hidden information that has won important awards in the industry and received some recent academic attention. A two-track competition of agents for the game will take place in the 2018 CIG conference. In this paper, we develop a genetic algorithm that builds rulebased agents by determining the best sequence of rules from a fixed rule set to use as strategy. In three separate experiments, we remove human assumptions regarding the ordering of rules, add new, more expressive rules to the rule set and independently evolve agents specialized at specific game sizes. As result, we achieve scores superior to previously published research for the mirror and mixed evaluation of agents. Game-playing agents have a long tradition of serving as benchmarks for AI research. However, traditionally most of the focus has been on competitive, perfect information games, such as Checkers [1], Chess [2] and Go [3]. Cooperative games with imperfect information provide an interesting research topic not only due to the added challenges posed to researchers, but also because many modern industrial and commercial applications can be characterized as examples of cooperation between humans and machines in order to achieve a mutual goal in an uncertain environment. In this paper, we address a particularly interesting cooperative game with partial information: Hanabi [4].
Procedural Level Generation Improves Generality of Deep Reinforcement Learning
Justesen, Niels, Torrado, Ruben Rodriguez, Bontrager, Philip, Khalifa, Ahmed, Togelius, Julian, Risi, Sebastian
Over the last few years, deep reinforcement learning (RL) has shown impressive results in a variety of domains, learning directly from high-dimensional sensory streams. However, when networks are trained in a fixed environment, such as a single level in a video game, it will usually overfit and fail to generalize to new levels. When RL agents overfit, even slight modifications to the environment can result in poor agent performance. In this paper, we present an approach to prevent overfitting by generating more general agent controllers, through training the agent on a completely new and procedurally generated level each episode. The level generator generate levels whose difficulty slowly increases in response to the observed performance of the agent. Our results show that this approach can learn policies that generalize better to other procedurally generated levels, compared to policies trained on fixed levels.
Deep Reinforcement Learning for General Video Game AI
Torrado, Ruben Rodriguez, Bontrager, Philip, Togelius, Julian, Liu, Jialin, Perez-Liebana, Diego
The General Video Game AI (GVGAI) competition and its associated software framework provides a way of benchmarking AI algorithms on a large number of games written in a domain-specific description language. While the competition has seen plenty of interest, it has so far focused on online planning, providing a forward model that allows the use of algorithms such as Monte Carlo Tree Search. In this paper, we describe how we interface GVGAI to the OpenAI Gym environment, a widely used way of connecting agents to reinforcement learning problems. Using this interface, we characterize how widely used implementations of several deep reinforcement learning algorithms fare on a number of GVGAI games. We further analyze the results to provide a first indication of the relative difficulty of these games relative to each other, and relative to those in the Arcade Learning Environment under similar conditions.
Optimal Sequential Drilling for Hydrocarbon Field Development Planning
Torrado, Ruben Rodriguez (Repsol S.A.) | Rios, Jesus (IBM TJ Watson Research Center) | Tesauro, Gerald (IBM TJ Watson Research Center)
We present a novel approach for planning the development of hydrocarbon fields, taking into account the sequential nature of well drilling decisions and the possibility to react to future information. In a dynamic fashion, we want to optimally decide where to drill each well conditional on every possible piece of information that could be obtained from previous wells. We formulate this sequential drilling optimization problem as a POMDP, and propose an algorithm to search for an optimal drilling policy. We show that our new approach leads to better results compared to the current standard in the oil and gas (O&G) industry.