PROWLER.io
Reports on the 2018 AAAI Spring Symposium Series
Amato, Christopher (Northeastern University) | Ammar, Haitham Bou (PROWLER.io) | Churchill, Elizabeth (Google) | Karpas, Erez (Technion - Israel Institute of Technology) | Kido, Takashi (Stanford University) | Kuniavsky, Mike (Parc) | Lawless, W. F. (Paine College) | Rossi, Francesca (IBM T. J. Watson Research Center and University of Padova) | Oliehoek, Frans A. (TU Delft) | Russell, Stephen (US Army Research Laboratory) | Takadama, Keiki (University of Electro-Communications) | Srivastava, Siddharth (Arizona State University) | Tuyls, Karl (Google DeepMind) | Allen, Philip Van (Art Center College of Design) | Venable, K. Brent (Tulane University and IHMC) | Vrancx, Peter (PROWLER.io) | Zhang, Shiqi (Cleveland State University)
The Association for the Advancement of Artificial Intelligence, in cooperation with Stanford Universityโs Department of Computer Science, presented the 2018 Spring Symposium Series, held Monday through Wednesday, March 26โ28, 2018, on the campus of Stanford University. The seven symposia held were AI and Society: Ethics, Safety and Trustworthiness in Intelligent Agents; Artificial Intelligence for the Internet of Everything; Beyond Machine Intelligence: Understanding Cognitive Bias and Humanity for Well-Being AI; Data Efficient Reinforcement Learning; The Design of the User Experience for Artificial Intelligence (the UX of AI); Integrated Representation, Reasoning, and Learning in Robotics; Learning, Inference, and Control of Multi-Agent Systems. This report, compiled from organizers of the symposia, summarizes the research of five of the symposia that took place.
Reinforcement Learning in POMDPs With Memoryless Options and Option-Observation Initiation Sets
Steckelmacher, Denis (Vrije Universiteit Brussels) | Roijers, Diederik M. (Vrije Universiteit Brussels) | Harutyunyan, Anna (Vrije Universiteit Brussels) | Vrancx, Peter (PROWLER.io) | Plisnier, Hรฉlรจne (Vrije Universiteit Brussels) | Nowรฉ, Ann (Vrije Universiteit Brussels)
Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the initiation set of options conditional on the previously-executed option, and show that options with such Option-Observation Initiation Sets (OOIs) are at least as expressive as Finite State Controllers (FSCs), a state-of-the-art approach for learning in POMDPs. OOIs are easy to design based on an intuitive description of the task, lead to explainable policies and keep the top-level and option policies memoryless. Our experiments show that OOIs allow agents to learn optimal policies in challenging POMDPs, while being much more sample-efficient than a recurrent neural network over options.
Learning With Options That Terminate Off-Policy
Harutyunyan, Anna (Vrije Universiteit Brussel) | Vrancx, Peter (PROWLER.io) | Bacon, Pierre-Luc (McGill University) | Precup, Doina (McGill University) | Nowรฉ, Ann (Vrije Universiteit Brussel)
A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides the option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optimal policy well, shorter options offer more flexibility and can yield a better solution. Thus, the termination condition puts learning efficiency at odds with solution quality. We propose to resolve this dilemma by decoupling the behavior and target terminations, just like it is done with policies in off-policy learning. To this end, we give a new algorithm, Q(beta), that learns the solution with respect to any termination condition, regardless of how the options actually terminate. We derive Q(beta) by casting learning with options into a common framework with well-studied multi-step off policy learning. We validate our algorithm empirically, and show that it holds up to its motivating claims.
Decentralised Learning in Systems With Many, Many Strategic Agents
Mguni, David (PROWLER.io) | Jennings, Joel (PROWLER.io) | Cote, Enrique Munoz de (PROWLER.io)
Although multi-agent reinforcement learning can tackle systems of strategically interacting entities, it currently fails in scalability and lacks rigorous convergence guarantees. Crucially, learning in multi-agent systems can become intractable due to the explosion in the size of the state-action space as the number of agents increases. In this paper, we propose a method for computing closed-loop optimal policies in multi-agent systems that scales independently of the number of agents. This allows us to show, for the first time, successful convergence to optimal behaviour in systems with an unbounded number of interacting adaptive learners. Studying the asymptotic regime of N-player stochastic games, we devise a learning protocol that is guaranteed to converge to equilibrium policies even when the number of agents is extremely large. Our method is model-free and completely decentralised so that each agent need only observe its local state information and its realised rewards. We validate these theoretical results by showing convergence to Nash-equilibrium policies in applications from economics and control theory with thousands of strategically interacting agents.