AITopics | Bowling, Michael

Collaborating Authors

Bowling, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Approximate exploitability: Learning a best response in large games

Timbers, Finbarr, Lockhart, Edward, Lanctot, Marc, Schmid, Martin, Schrittwieser, Julian, Hubert, Thomas, Bowling, Michael

arXiv.org Machine LearningSep-22-2020

A standard metric used to measure the approximate optimality of policies in imperfect information games is exploitability, i.e. the performance of a policy against its worst-case opponent. However, exploitability is intractable to compute in large games as it requires a full traversal of the game tree to calculate a best response to the given policy. We introduce a new metric, approximate exploitability, that calculates an analogous metric using an approximate best response; the approximation is done by using search and reinforcement learning. This is a generalization of local best response, a domain specific evaluation metric used in poker. We provide empirical results for a specific instance of the method, demonstrating that our method converges to exploitability in the tabular and function approximation settings for small games. In large games, our method learns to exploit both strong and weak agents, learning to exploit an AlphaZero agent.

agent, artificial intelligence, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2004.09677

Country:

North America > United States (0.15)
North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Chess (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (1.00)

Add feedback

The Advantage Regret-Matching Actor-Critic

Gruslys, Audrūnas, Lanctot, Marc, Munos, Rémi, Timbers, Finbarr, Schmid, Martin, Perolat, Julien, Morrill, Dustin, Zambaldi, Vinicius, Lespiau, Jean-Baptiste, Schultz, John, Azar, Mohammad Gheshlaghi, Bowling, Michael, Tuyls, Karl

arXiv.org Artificial IntelligenceAug-27-2020

Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL). In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior. We propose a model-free RL algorithm, the AdvantageRegret-Matching Actor-Critic (ARMAC): rather than saving past state-action data, ARMAC saves a buffer of past policies, replaying through them to reconstruct hindsight assessments of past behavior. These retrospective value estimates are used to predict conditional advantages which, combined with regret matching, produces a new policy. In particular, ARMAC learns from sampled trajectories in a centralized training setting, without requiring the application of importance sampling commonly used in Monte Carlo counterfactual regret (CFR) minimization; hence, it does not suffer from excessive variance in large environments. In the single-agent setting, ARMAC shows an interesting form of exploration by keeping past policies intact. In the multiagent setting, ARMAC in self-play approaches Nash equilibria on some partially-observable zero-sum benchmarks. We provide exploitability estimates in the significantly larger game of betting-abstracted no-limit Texas Hold'em.

algorithm, computer game, deep learning, (19 more...)

arXiv.org Artificial Intelligence

2008.12234

Country: North America > United States > Texas (0.25)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Marginal Utility for Planning in Continuous or Large Discrete Action Spaces

Ahmad, Zaheen Farraz, Lelis, Levi H. S., Bowling, Michael

arXiv.org Artificial IntelligenceJun-17-2020

Sample-based planning is a powerful family of algorithms for generating intelligent behavior from a model of the environment. Generating good candidate actions is critical to the success of sample-based planners, particularly in continuous or large action spaces. Typically, candidate action generation exhausts the action space, uses domain knowledge, or more recently, involves learning a stochastic policy to provide such search guidance. In this paper we explore explicitly learning a candidate action generator by optimizing a novel objective, marginal utility. The marginal utility of an action generator measures the increase in value of an action over previously generated actions. We validate our approach in both curling, a challenging stochastic domain with continuous state and action spaces, and a location game with a discrete but large action space. We show that a generator trained with the marginal utility objective outperforms hand-coded schemes built on substantial domain knowledge, trained stochastic policies, and other natural objectives for generating actions for sampled-based planners.

generator, neural network, planning & scheduling, (20 more...)

arXiv.org Artificial Intelligence

2006.06054

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

Computing Robust Counter-Strategies

Johanson, Michael, Zinkevich, Martin, Bowling, Michael

Neural Information Processing SystemsFeb-15-2020, 05:11:46 GMT

Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counter-strategy to the inferred posterior of the other agents' behavior. In the experts paradigm, one may want to choose experts that are good counter-strategies to the other agents' expected behavior. In this paper we introduce a technique for computing robust counter-strategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed.

artificial intelligence, computing robust counter-strategy, paradigm, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.63)

Add feedback

Low-Variance and Zero-Variance Baselines for Extensive-Form Games

Davis, Trevor, Schmid, Martin, Bowling, Michael

arXiv.org Artificial IntelligenceJul-22-2019

Extensive-form games (EFGs) are a common model of multi-agent interactions with imperfect information. State-of-the-art algorithms for solving these games typically perform full walks of the game tree that can prove prohibitively slow in large games. Alternatively, sampling-based methods such as Monte Carlo Counterfactual Regret Minimization walk one or more trajectories through the tree, touching only a fraction of the nodes on each iteration, at the expense of requiring more iterations to converge due to the variance of sampled values. In this paper, we extend recent work that uses baseline estimates to reduce this variance. We introduce a framework of baseline-corrected values in EFGs that generalizes the previous work. Within our framework, we propose new baseline functions that result in significantly reduced variance compared to existing techniques. We show that one particular choice of such a function --- predictive baseline --- is provably optimal under certain sampling schemes. This allows for efficient computation of zero-variance value estimates even along sampled trajectories.

artificial intelligence, game theory, null, (19 more...)

arXiv.org Artificial Intelligence

1907.09633

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rethinking Formal Models of Partially Observable Multiagent Decision Making

Kovařík, Vojtěch, Schmid, Martin, Burch, Neil, Bowling, Michael, Lisý, Viliam

arXiv.org Artificial IntelligenceJun-26-2019

Multiagent decision-making problems in partially observable environments are usually modeled as either extensive-form games (EFGs) within the game theory community or partially observable stochastic games (POSGs) within the reinforcement learning community. While most practical problems can be modeled in both formalisms, the communities using these models are mostly distinct with little sharing of ideas or advances. The last decade has seen dramatic progress in algorithms for EFGs, mainly driven by the challenge problem of poker. We have seen computational techniques achieving super-human performance, some variants of poker are essentially solved, and there are now sound local search algorithms which were previously thought impossible. While the advances have garnered attention, the fundamental advances are not yet understood outside the EFG community. This can be largely explained by the starkly different formalisms between the game theory and reinforcement learning communities and, further, by the unsuitability of the original EFG formalism to make the ideas simple and clear. This paper aims to address these hindrances, by advocating a new unifying formalism, a variant of POSGs, which we call Factored-Observation Games (FOGs). We prove that any timeable perfect-recall EFG can be efficiently modeled as a FOG as well as relating FOGs to other existing formalisms. Additionally, a FOG explicitly identifies the public and private components of observations, which is fundamental to the recent EFG breakthroughs. We conclude by presenting the two building-blocks of these breakthroughs --- counterfactual regret minimization and public state decomposition --- in the new formalism, illustrating our goal of a simpler path for sharing recent advances between game theory and reinforcement learning community.

computer game, game theory, information, (23 more...)

arXiv.org Artificial Intelligence

1906.1111

Country:

North America > United States (0.14)
Europe > Netherlands (0.14)
North America > Canada (0.14)
Europe > Czechia (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Ease-of-Teaching and Language Structure from Emergent Communication

Li, Fushan, Bowling, Michael

arXiv.org Artificial IntelligenceJun-5-2019

Artificial agents have been shown to learn to communicate when needed to complete a cooperative task. Some level of language structure (e.g., compositionality) has been found in the learned communication protocols. This observed structure is often the result of specific environmental pressures during training. By introducing new agents periodically to replace old ones, sequentially and within a population, we explore such a new pressure -- ease of teaching -- and show its impact on the structure of the resulting language.

artificial intelligence, listener, neural network, (17 more...)

arXiv.org Artificial Intelligence

1906.02403

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

The Hanabi Challenge: A New Frontier for AI Research

Bard, Nolan, Foerster, Jakob N., Chandar, Sarath, Burch, Neil, Lanctot, Marc, Song, H. Francis, Parisotto, Emilio, Dumoulin, Vincent, Moitra, Subhodeep, Hughes, Edward, Dunning, Iain, Mourad, Shibl, Larochelle, Hugo, Bellemare, Marc G., Bowling, Michael

arXiv.org Machine LearningFeb-1-2019

From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay and imperfect information in a two to five player setting. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques capable of imbuing artificial agents with such theory of mind will not only be crucial for their success in Hanabi, but also in broader collaborative efforts, and especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques.

agent, chess, computer game, (25 more...)

arXiv.org Machine Learning

1902.00506

Country:

North America > United States (0.28)
North America > Canada > Alberta (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > Promising Solution (0.54)

Industry:

Leisure & Entertainment > Games > Go (0.67)
Leisure & Entertainment > Games > Chess (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Srinivasan, Sriram, Lanctot, Marc, Zambaldi, Vinicius, Perolat, Julien, Tuyls, Karl, Munos, Remi, Bowling, Michael

Neural Information Processing SystemsDec-31-2018

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments. We show several candidate policy update rules and relate them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. We apply our method to model-free multiagent reinforcement learning in adversarial sequential decision problems (zero-sum imperfect information games), using RL-style function approximation. We evaluate on commonly used benchmark Poker domains, showing performance against fixed policies and empirical convergence to approximate Nash equilibria in self-play with rates similar to or better than a baseline model-free algorithm for zero-sum games, without any domain-specific state space reductions.

artificial intelligence, computer game, survey article, (16 more...)

Neural Information Processing Systems

Country: