Luke, Sean
Simulate Less, Expect More: Bringing Robot Swarms to Life via Low-Fidelity Simulations
Vega, Ricardo, Zhu, Kevin, Luke, Sean, Parsa, Maryam, Nowzari, Cameron
This paper proposes a novel methodology for addressing the simulation-reality gap for multi-robot swarm systems. Rather than immediately try to shrink or `bridge the gap' anytime a real-world experiment failed that worked in simulation, we characterize conditions under which this is actually necessary. When these conditions are not satisfied, we show how very simple simulators can still be used to both (i) design new multi-robot systems, and (ii) guide real-world swarming experiments towards certain emergent behaviors when the gap is very large. The key ideas are an iterative simulator-in-the-design-loop in which real-world experiments, simulator modifications, and simulated experiments are intimately coupled in a way that minds the gap without needing to shrink it, as well as the use of minimally viable phase diagrams to guide real world experiments. We demonstrate the usefulness of our methods on deploying a real multi-robot swarm system to successfully exhibit an emergent milling behavior.
Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space
Wei, Ermo, Wicke, Drew, Luke, Sean
We explore Deep Reinforcement Learning in a parameterized action space. Specifically, we investigate how to achieve sample-efficient end-to-end training in these tasks. We propose a new compact architecture for the tasks where the parameter policy is conditioned on the output of the discrete action policy. We also propose two new methods based on the state-of-the-art algorithms Trust Region Policy Optimization (TRPO) and Stochastic Value Gradient (SVG) to train such an architecture. We demonstrate that these methods outperform the state of the art method, Parameterized Action DDPG, on test domains.
Multiagent Soft Q-Learning
Wei, Ermo, Wicke, Drew, Freelan, David, Luke, Sean
Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.
LfD Training of Heterogeneous Formation Behaviors
Squires, William G. (George Mason University) | Luke, Sean (George Mason University)
Problem domains such as disaster relief, search and rescue, and games can benefit from having a human quickly train coordinated behaviors for a diverse set of agents. Hierarchical Training of Agent Behaviors (HiTAB) is a Learning from Demonstration (LfD) approach that addresses some inherent complexities in multiagent learning, making it possible to train complex heterogeneous behaviors from a small set of training samples. In this paper, we successfully demonstrate LfD training of formation behaviors using a small set of agents that, without retraining, continue to operate correctly when additional agents are available. We selected training of formations for the experiments because formations: require a great deal of coordination between agents, are heterogenous due to the differing roles of participating agents, and can scale as the number of agents grows. We also introduce some extensions to HiTAB that facilitate this type of training.
Multiagent Soft Q-Learning
Wei, Ermo (George Mason University) | Wicke, Drew (George Mason University) | Freelan, David (George Mason University) | Luke, Sean (George Mason University)
Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.
Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space
Wei, Ermo (George Mason University) | Wicke, Drew (George Mason University) | Luke, Sean (George Mason University)
We explore Deep Reinforcement Learning in a parameterized action space. Specifically, we investigate how to achieve sample-efficient end-to-end training in these tasks. We propose a new compact architecture for the tasks where the parameter policy is conditioned on the output of the discrete action policy. We also propose two new methods based on the state-of-the-art algorithms Trust Region Policy Optimization (TRPO) and Stochastic Value Gradient (SVG) to train such an architecture. We demonstrate that these methods outperform the state of the art method, Parameterized Action DDPG, on test domains.
Bounty Hunting and Human-Agent Group Task Allocation
Wicke, Drew (George Mason University) | Luke, Sean (George Mason University)
Much research has been done to apply auctions, markets, and negotiation mechanisms to solve the multiagent task allocation problem. However, there has been very little work on human-agent group task allocation. We believe that the notion of bounty hunting has good properties for human-agent group interaction in dynamic task allocation problems. We use previous experimental results comparing bounty hunting with auction-like methods to argue why it would be particularly adept at handling scenarios with unreliable collaborators and unexpectedly hard tasks: scenarios we believe highlight difficulties involved in working with humans collaborators.
Reports on the 2004 AAAI Fall Symposia
Cassimatis, Nick, Luke, Sean, Levy, Simon D., Gayler, Ross, Kanerva, Pentti, Eliasmith, Chris, Bickmore, Timothy, Schultz, Alan C., Davis, Randall, Landay, James, Miller, Rob, Saund, Eric, Stahovich, Tom, Littman, Michael, Singh, Satinder, Argamon, Shlomo, Dubnov, Shlomo
The Association for the Advancement of Artificial Intelligence presented its 2004 Fall Symposium Series Friday through Sunday, October 22-24 at the Hyatt Regency Crystal City in Arlington, Virginia, adjacent to Washington, DC. The symposium series was preceded by a one-day AI funding seminar. The topics of the eight symposia in the 2004 Fall Symposia Series were: (1) Achieving Human-Level Intelligence through Integrated Systems and Research; (2) Artificial Multiagent Learning; (3) Compositional Connectionism in Cognitive Science; (4) Dialogue Systems for Health Communications; (5) The Intersection of Cognitive Science and Robotics: From Interfaces to Intelligence; (6) Making Pen-Based Interaction Intelligent and Natural; (7) Real- Life Reinforcement Learning; and (8) Style and Meaning in Language, Art, Music, and Design.
Reports on the 2004 AAAI Fall Symposia
Cassimatis, Nick, Luke, Sean, Levy, Simon D., Gayler, Ross, Kanerva, Pentti, Eliasmith, Chris, Bickmore, Timothy, Schultz, Alan C., Davis, Randall, Landay, James, Miller, Rob, Saund, Eric, Stahovich, Tom, Littman, Michael, Singh, Satinder, Argamon, Shlomo, Dubnov, Shlomo
Learning) are also available as AAAI be integrated and (2) architectures Technical Reports. There through Sunday, October 22-24 at an opportunity for new and junior researchers--as was consensus among participants the Hyatt Regency Crystal City in Arlington, well as students and that metrics in machine learning, Virginia, adjacent to Washington, postdoctoral fellows--to get an inside planning, and natural language processing DC. The symposium series was look at what funding agencies expect have driven advances in those preceded on Thursday, October 21 by in proposals from prospective subfields, but that those metrics have a one-day AI funding seminar, which grantees. Representatives and program also distracted attention from how to was open to all registered attendees. The topic is of increasing interest Domains for motivating, testing, large numbers of agents, more complex with the advent of peer-to-peer network and funding this research were agent behaviors, partially observable services and with ad-hoc wireless proposed (some during our joint session environments, and mutual adaptation.
Three RoboCup Simulation League Commentator Systems
Andre, Elisabeth, Binsted, Kim, Tanaka-Ishii, Kumiko, Luke, Sean, Herzog, Gerd, Rist, Thomas
Three systems that generate real-time natural language commentary on the RoboCup simulation league are presented, and their similarities, differences, and directions for the future discussed. Although they emphasize different aspects of the commentary problem, all three systems take simulator data as input and generate appropriate, expressive, spoken commentary in real time.