Mendoza, Juan Pablo
Online Learning of Robot Soccer Free Kick Plans Using a Bandit Approach
Mendoza, Juan Pablo (Carnegie Mellon University) | Simmons, Reid (Carnegie Mellon University) | Veloso, Manuela (Carnegie Mellon University)
This paper presents an online learning approach for teams of autonomous soccer robots to select free kick plans. In robot soccer, free kicks present an opportunity to execute plans with relatively controllable initial conditions. However, the effectiveness of each plan is highly dependent on the adversary, and there are few free kicks during each game, making it necessary to learn online from sparse observations. To achieve learning, we first greatly reduce the planning space by framing the problem as a contextual multi-armed bandit problem, in which the actions are a set of pre-computed plans, and the state is the position of the free kick on the field. During execution, we model the reward function for different free kicks using Gaussian Processes, and perform online learning using the Upper Confidence Bound algorithm. Results from a physics-based simulation reveal that the robots are capable of adapting to various different realistic opponents to maximize their expected reward during free kicks.
Selectively Reactive Coordination for a Team of Robot Soccer Champions
Mendoza, Juan Pablo (Carnegie Mellon University) | Biswas, Joydeep (Carnegie Mellon University) | Cooksey, Philip (Carnegie Mellon University) | Wang, Richard (Carnegie Mellon University) | Klee, Steven (Carnegie Mellon University) | Zhu, Danny (Carnegie Mellon University) | Veloso, Manuela (Carnegie Mellon University)
CMDragons 2015 is the champion of the RoboCup Small Size League of autonomous robot soccer. The team won all of its six games, scoring a total of 48 goals and conceding 0. This unprecedented dominant performance is the result of various features, but we particularly credit our novel offense multi-robot coordination. This paper thus presents our Selectively Reactive Coordination (SRC) algorithm, consisting of two layers: A coordinated opponent-agnostic layer enables the team to create its own plans, setting the pace of the game in offense. An individual opponent-reactive action selection layer enables the robots to maintain reactivity to different opponents. We demonstrate the effectiveness of our coordination through results from RoboCup 2015, and through controlled experiments using a physics-based simulator and an automated referee.