Learning effective policies in multi-agent adversarial games is a significant challenge since the search space can be prohibitively large when the actions of all the agents are considered simultaneously. Recent advances in Monte Carlo search methods have produced good results in single-agent games like Go with very large search spaces. In this paper, we propose a variation on the Monte Carlo method, UCT (Upper Confidence Bound Trees), for multi-agent, continuous-valued, adversarial games and demonstrate its utility at generating American football plays for Rush Football 2008. In football, like in many other multi-agent games, the actions of all of the agents are not equally crucial to gameplay success. By automatically identifying key players from historical game play, we can focus the UCT search on player groupings that have the largest impact on yardage gains in a particular formation.
In this paper, we investigate the hypothesis that plan recognition can significantly improve the performance of a case-based reinforcement learner in an adversarial action selection task. Our environment is a simplification of an American football game. The performance task is to control the behavior of a quarterback in a pass play, where the goal is to maximize yardage gained. Plan recognition focuses on predicting the play of the defensive team. We modeled plan recognition as an unsupervised learning task, and conducted a lesion study. We found that plan recognition was accurate, and that it significantly improved performance. More generally, our studies show that plan recognition reduced the dimensionality of the state space, which allowed learning to be conducted more effectively. We describe the algorithms, explain the reasons for performance improvement, and also describe a further empirical comparison that highlights the utility of plan recognition for this task.
One drawback with using plan recognition in adversarial games is that often players must commit to a plan before it is possible to infer the opponent's intentions. In such cases, it is valuable to couple plan recognition with plan repair, particularly in multi-agent domains where complete replanning is not computationally feasible. This paper presents a method for learning plan repair policies in real-time using Upper Confidence Bounds applied to Trees (UCT). We demonstrate how these policies can be coupled with plan recognition in an American football game (Rush 2008) to create an autonomous offensive team capable of responding to unexpected changes in defensive strategy. Our real-time version of UCT learns play modifications that result in a significantly higher average yardage and fewer interceptions than either the baseline game or domain-specific heuristics. Although it is possible to use the actual game simulator to measure reward offline, to execute UCT in real-time demands a different approach; here we describe two modules for reusing data from offline UCT searches to learn accurate state and reward estimators.
Lucey, Patrick (Disney Research Pittsburgh) | Bialkowski, Alina (Queensland University of Technology and Disney Research Pittsburgh) | Carr, Peter (Disney Research Pittsburgh) | Foote, Eric (Disney Research Pittsburgh) | Matthews, Iain (Disney Research Pittsburgh)
Real-world AI systems have been recently deployed which can automatically analyze the plan and tactics of tennis players. As the game-state is updated regularly at short intervals (i.e. point-level), a library of successful and unsuccessful plans of a player can be learnt over time. Given the relative strengths and weaknesses of a player’s plans, a set of proven plans or tactics from the library that characterize a player can be identified. For low-scoring, continuous team sports like soccer, such analysis for multi-agent teams does not exist as the game is not segmented into “discretized” plays (i.e. plans), making it difficult to obtain a library that characterizes a team’s behavior. Additionally, as player tracking data is costly and difficult to obtain, we only have partial team tracings in the form of ball actions which makes this problem even more difficult. In this paper, we propose a method to overcome these issues by representing team behavior via play-segments, which are spatio-temporal descriptions of ball movement over fixed windows of time. Using these representations we can characterize team behavior from entropy maps, which give a measure of predictability of team behaviors across the field. We show the efficacy and applicability of our method on the 2010-2011 English Premier League soccer data.
We consider the problem of tactical assault planning in real-time strategy games where a team of friendly agents must launch an assault on an enemy. This problem offers many challenges including a highly dynamic and uncertain environment, multiple agents, durative actions, numeric attributes, and different optimization objectives. While the dynamics of this problem are quite complex, it is often possible to provide or learn a coarse simulation-based model of a tactical domain, which makes Monte-Carlo planning an attractive approach. In this paper, we investigate the use of UCT, a recent Monte-Carlo planning algorithm for this problem. UCT has recently shown impressive successes in the area of games, particularly Go, but has not yet been considered in the context of multi-agent tactical planning. We discuss the challenges of adapting UCT to our domain and an implementation which allows for the optimization of user specified objective functions. We present an evaluation of our approach on a range of tactical assault problems with different objectives in the RTS game Wargus. The results indicate that our planner is able to generate superior plans compared to several baselines and a human player.