Johanson, Michael
Negotiating Team Formation Using Deep Reinforcement Learning
Bachrach, Yoram, Everett, Richard, Hughes, Edward, Lazaridou, Angeliki, Leibo, Joel Z., Lanctot, Marc, Johanson, Michael, Czarnecki, Wojciech M., Graepel, Thore
When autonomous agents interact in the same environment, they must often cooperate to achieve their goals. One way for agents to cooperate effectively is to form a team, make a binding agreement on a joint plan, and execute it. However, when agents are self-interested, the gains from team formation must be allocated appropriately to incentivize agreement. Various approaches for multi-agent negotiation have been proposed, but typically only work for particular negotiation protocols. More general methods usually require human input or domain-specific data, and so do not scale. To address this, we propose a framework for training agents to negotiate and form teams using deep reinforcement learning. Importantly, our method makes no assumptions about the specific negotiation protocol, and is instead completely experience driven. We evaluate our approach on both non-spatial and spatially extended team-formation negotiation environments, demonstrating that our agents beat hand-crafted bots and reach negotiation outcomes consistent with fair solutions predicted by cooperative game theory. Additionally, we investigate how the physical location of agents influences negotiation outcomes.
Computing Robust Counter-Strategies
Johanson, Michael, Zinkevich, Martin, Bowling, Michael
Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counter-strategy to the inferred posterior of the other agents' behavior. In the experts paradigm, one may want to choose experts that are good counter-strategies to the other agents' expected behavior. In this paper we introduce a technique for computing robust counter-strategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed.
Learned human-agent decision-making, communication and joint action in a virtual reality environment
Pilarski, Patrick M., Butcher, Andrew, Johanson, Michael, Botvinick, Matthew M., Bolt, Andrew, Parker, Adam S. R.
Humans make decisions and act alongside other humans to pursue both short-term and long-term goals. As a result of ongoing progress in areas such as computing science and automation, humans now also interact with non-human agents of varying complexity as part of their day-to-day activities; substantial work is being done to integrate increasingly intelligent machine agents into human work and play. With increases in the cognitive, sensory, and motor capacity of these agents, intelligent machinery for human assistance can now reasonably be considered to engage in joint action with humans---i.e., two or more agents adapting their behaviour and their understanding of each other so as to progress in shared objectives or goals. The mechanisms, conditions, and opportunities for skillful joint action in human-machine partnerships is of great interest to multiple communities. Despite this, human-machine joint action is as yet under-explored, especially in cases where a human and an intelligent machine interact in a persistent way during the course of real-time, daily-life experience. In this work, we contribute a virtual reality environment wherein a human and an agent can adapt their predictions, their actions, and their communication so as to pursue a simple foraging task. In a case study with a single participant, we provide an example of human-agent coordination and decision-making involving prediction learning on the part of the human and the machine agent, and control learning on the part of the machine agent wherein audio communication signals are used to cue its human partner in service of acquiring shared reward. These comparisons suggest the utility of studying human-machine coordination in a virtual reality environment, and identify further research that will expand our understanding of persistent human-machine joint action.
Solving Imperfect Information Games Using Decomposition
Burch, Neil (University of Alberta) | Johanson, Michael (University of Alberta) | Bowling, Michael (University of Alberta)
Decomposition, i.e. independently analyzing possible subgames, has proven to be an essential principle for effective decision-making in perfect information games. However, in imperfect information games, decomposition has proven to be problematic. To date, all proposed techniques for decomposition in imperfect information games have abandoned theoretical guarantees. This work presents the first technique for decomposing an imperfect information game into subgames that can be solved independently, while retaining optimality guarantees on the full-game solution. We can use this technique to construct theoretically justified algorithms that make better use of information available at run-time, overcome memory or disk limitations at run-time, or make a time/space trade-off to overcome memory or disk limitations while solving a game. In particular, we present an algorithm for subgame solving which guarantees performance in the whole game, in contrast to existing methods which may have unbounded error. In addition, we present an offline game solving algorithm, CFR-D, which can produce a Nash equilibrium for a game that is larger than available storage.
The AAAI-13 Conference Workshops
Agrawal, Vikas (IBM Research-India) | Archibald, Christopher (Mississippi State University) | Bhatt, Mehul (University of Bremen) | Bui, Hung (Nuance) | Cook, Diane J. (Washington State University) | Cortรฉs, Juan (University of Toulouse) | Geib, Christopher (Drexel University) | Gogate, Vibhav (University of Texas at Dallas) | Guesgen, Hans W. (Massey University) | Jannach, Dietmar (TU Dortmund) | Johanson, Michael (University of Alberta) | Kersting, Kristian (University of Bonn) | Konidaris, George (Massachusetts Institute of Technology) | Kotthoff, Lars (University College Cork) | Michalowski, Martin (Adventium Labs) | Natarajan, Sriraam (Indiana University) | O'Sullivan, Barry (University College Cork) | Pickett, Marc (Naval Research Laboratory) | Podobnik, Vedran (University of Zagreb) | Poole, David (University of British Columbia) | Shastri, Lokendra (GM Research, India) | Shehu, Amarda (George Mason University) | Sukthankar, Gita (University of Central Florida)
The AAAI-13 Conference Workshops
Agrawal, Vikas (IBM Research-India) | Archibald, Christopher (Mississippi State University) | Bhatt, Mehul (University of Bremen) | Bui, Hung (Nuance) | Cook, Diane J. (Washington State University) | Cortรฉs, Juan (University of Toulouse) | Geib, Christopher (Drexel University) | Gogate, Vibhav (University of Texas at Dallas) | Guesgen, Hans W. (Massey University) | Jannach, Dietmar (TU Dortmund) | Johanson, Michael (University of Alberta) | Kersting, Kristian (University of Bonn) | Konidaris, George (Massachusetts Institute of Technology) | Kotthoff, Lars (University College Cork) | Michalowski, Martin (Adventium Labs) | Natarajan, Sriraam (Indiana University) | O' (University College Cork) | Sullivan, Barry (Naval Research Laboratory) | Pickett, Marc (University of Zagreb) | Podobnik, Vedran (University of British Columbia) | Poole, David (GM Research, India) | Shastri, Lokendra (George Mason University) | Shehu, Amarda (University of Central Florida) | Sukthankar, Gita
Benjamin Grosof (Coherent Knowledge from episodic memory to great progress is being made on methods Systems) on representing activity create semantic memory, using a combination to solve problems related to structure context through semantic rule methods, of semantic memory and prediction, motion simulation, deriving from experience in the episodic memory to guide users?
Finding Optimal Abstract Strategies in Extensive-Form Games
Johanson, Michael (University of Alberta) | Bard, Nolan (University of Alberta) | Burch, Neil (University of Alberta) | Bowling, Michael (University of Alberta)
Extensive-form games are a powerful model for representing interactions between agents. Nash equilibrium strategies are a common solution concept for extensive-form games and, in two-player zero-sum games, there are efficient algorithms for calculating such strategies. In large games, this computation may require too much memory and time to be tractable. A standard approach in such cases is to apply a lossy state-space abstraction technique to produce a smaller abstract game that can be tractably solved, while hoping that the resulting abstract game equilibrium is close to an equilibrium strategy in the unabstracted game. Recent work has shown that this assumption is unreliable, and an arbitrary Nash equilibrium in the abstract game is unlikely to be even near the least suboptimal strategy that can be represented in that space. In this work, we present for the first time an algorithm which efficiently finds optimal abstract strategies --- strategies with minimal exploitability in the unabstracted game. We use this technique to find the least exploitable strategy ever reported for two-player limit Texas hold'em.
Computing Robust Counter-Strategies
Johanson, Michael, Zinkevich, Martin, Bowling, Michael
Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counter-strategy to the inferred posterior of the other agents' behavior. In the experts paradigm, one may want to choose experts that are good counter-strategies to the other agents' expected behavior. In this paper we introduce a technique for computing robust counter-strategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed. The technique involves solving a modified game, and therefore can make use of recently developed algorithms for solving very large extensive games. We demonstrate the effectiveness of the technique in two-player Texas Hold'em. We show that the computed poker strategies are substantially more robust than best response counter-strategies, while still exploiting a suspected tendency. We also compose the generated strategies in an experts algorithm showing a dramatic improvement in performance over using simple best responses.
Regret Minimization in Games with Incomplete Information
Zinkevich, Martin, Johanson, Michael, Bowling, Michael, Piccione, Carmelo
Extensive games are a powerful model of multiagent decision-making scenarios with incomplete information. Finding a Nash equilibrium for very large instances of these games has received a great deal of recent attention. In this paper, we describe a new technique for solving large games based on regret minimization. In particular, we introduce the notion of counterfactual regret, which exploits the degree of incomplete information in an extensive game. We show how minimizing counterfactual regret minimizes overall regret, and therefore in self-play can be used to compute a Nash equilibrium. We demonstrate this technique in the domain of poker, showing we can solve abstractions of limit Texas Hold'em with as many as 10