AITopics | Waugh, Kevin

Collaborating Authors

Waugh, Kevin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diversifying AI: Towards Creative Chess with AlphaZero

Zahavy, Tom, Veeriah, Vivek, Hou, Shaobo, Waugh, Kevin, Lai, Matthew, Leurent, Edouard, Tomasev, Nenad, Schut, Lisa, Hassabis, Demis, Singh, Satinder

arXiv.org Artificial IntelligenceAug-29-2023

In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. This work explores whether AI can benefit from creative decision-making mechanisms when pushed to the limits of its computational rationality. In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones. We study this question in the game of chess, the so-called drosophila of AI. We build on AlphaZero (AZ) and extend it to represent a league of agents via a latent-conditioned architecture, which we call AZ_db. We train AZ_db to generate a wider range of ideas using behavioral diversity techniques and select the most promising ones with sub-additive planning. Our experiments suggest that AZ_db plays chess in diverse ways, solves more puzzles as a group and outperforms a more homogeneous team. Notably, AZ_db solves twice as many challenging puzzles as AZ, including the challenging Penrose positions. When playing chess from different openings, we notice that players in AZ_db specialize in different openings, and that selecting a player for each opening using sub-additive planning results in a 50 Elo improvement over AZ. Our findings suggest that diversity bonuses emerge in teams of AI agents, just as they do in teams of humans and that diversity is a valuable asset in solving computationally hard problems.

artificial intelligence, creative chess, diversifying ai, (1 more...)

arXiv.org Artificial Intelligence

2308.09175

Genre: Research Report > New Finding (0.53)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Player of Games

Schmid, Martin, Moravcik, Matej, Burch, Neil, Kadlec, Rudolf, Davidson, Josh, Waugh, Kevin, Bard, Nolan, Timbers, Finbarr, Lanctot, Marc, Holland, Zach, Davoodi, Elnaz, Christianson, Alden, Bowling, Michael

arXiv.org Artificial IntelligenceDec-6-2021

Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.

artificial intelligence, machine learning, reinforcement learning, (23 more...)

arXiv.org Artificial Intelligence

2112.03178

Country:

Europe > United Kingdom > Scotland (0.26)
North America > United States > Texas (0.25)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Games > Chess (0.88)
Leisure & Entertainment > Games > Poker (0.88)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(4 more...)

Add feedback

Solving Large Extensive-Form Games with Strategy Constraints

Davis, Trevor, Waugh, Kevin, Bowling, Michael

arXiv.org Artificial IntelligenceSep-20-2018

Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zero-sum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.

computer game, constraint, game theory, (18 more...)

arXiv.org Artificial Intelligence

1809.07893

Country: North America > Canada > Alberta (0.14)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.55)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Theoretical and Practical Advances on Smoothing for Extensive-Form Games

Kroer, Christian, Waugh, Kevin, Kilinc-Karzan, Fatma, Sandholm, Tuomas

arXiv.org Artificial IntelligenceMay-8-2017

Sparse iterative methods, in particular first-order methods, are known to be among the most effective in solving large-scale two-player zero-sum extensive-form games. The convergence rates of these methods depend heavily on the properties of the distance-generating function that they are based on. We investigate the acceleration of first-order methods for solving extensive-form games through better design of the dilated entropy function---a class of distance-generating functions related to the domains associated with the extensive-form games. By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that has no dependence on the branching factor of the player. This result improves the convergence rate of several first-order methods by a factor of $\Omega(b^dd)$, where $b$ is the branching factor of the player, and $d$ is the depth of the game tree. Thus far, counterfactual regret minimization methods have been faster in practice, and more popular, than first-order methods despite their theoretically inferior convergence rates. Using our new weighting scheme and practical tuning we show that, for the first time, the excessive gap technique can be made faster than the fastest counterfactual regret minimization algorithm, CFR+, in practice.

artificial intelligence, convergence rate, game theory, (16 more...)

arXiv.org Artificial Intelligence

1702.04849

Country:

North America > United States (0.46)
North America > Canada > Alberta (0.14)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Solving Games with Functional Regret Estimation

Waugh, Kevin (Carnegie Mellon University) | Morrill, Dustin (University of Alberta) | Bagnell, James Andrew (Carnegie Mellon University) | Bowling, Michael (University of Alberta)

AAAI ConferencesMar-6-2015

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work onabstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.

abstraction, artificial intelligence, game theory, (18 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > Canada > Alberta (0.29)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Solving Games with Functional Regret Estimation

Waugh, Kevin (Carnegie Mellon University) | Morrill, Dustin (University of Alberta) | Bagnell, James Andrew (Carnegie Mellon University) | Bowling, Michael (University of Alberta)

AAAI ConferencesMar-1-2015

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.

abstraction, artificial intelligence, game theory, (19 more...)

AAAI Conferences

Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > Canada > Alberta (0.29)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Solving Games with Functional Regret Estimation

Waugh, Kevin, Morrill, Dustin, Bagnell, J. Andrew, Bowling, Michael

arXiv.org Artificial IntelligenceDec-31-2014

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.

abstraction, artificial intelligence, game theory, (19 more...)

arXiv.org Artificial Intelligence

1411.7974

Country:

North America > Canada > Alberta (0.29)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Computational Rationalization: The Inverse Equilibrium Problem

Waugh, Kevin, Ziebart, Brian D., Bagnell, J. Andrew

arXiv.org Machine LearningAug-15-2013

Modeling the purposeful behavior of imperfect agents from a small number of observations is a challenging task. When restricted to the single-agent decision-theoretic setting, inverse optimal control techniques assume that observed behavior is an approximately optimal solution to an unknown decision problem. These techniques learn a utility function that explains the example behavior and can then be used to accurately predict or imitate future behavior in similar observed or unobserved situations. In this work, we consider similar tasks in competitive and cooperative multi-agent domains. Here, unlike single-agent settings, a player cannot myopically maximize its reward; it must speculate on how the other agents may act to influence the game's outcome. Employing the game-theoretic notion of regret and the principle of maximum entropy, we introduce a technique for predicting and generalizing behavior.

deviation, game theory, ground transportation, (20 more...)

arXiv.org Machine Learning

1308.3506

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (0.93)
Consumer Products & Services > Hotels (0.68)
Transportation > Ground > Road (0.67)
Law (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.54)

Add feedback

Strategy Purification

Ganzfried, Sam (Carnegie Mellon University) | Sandholm, Tuomas (Carnegie Mellon University) | Waugh, Kevin (Carnegie Mellon University)

AAAI ConferencesAug-8-2011

There has been significant recent interest in computing effective practical strategies for playing large games. Most prior work involves computing an approximate equilibrium strategy in a smaller abstract game, then playing this strategy in the full game. In this paper, we present a modification of this approach that works by constructing a deterministic strategy in the full game from the solution to the abstract game; we refer to this procedure as purification. We show that purification, and its generalization which we call thresholding, lead to significantly stronger play than the standard approach in a wide variety of experimental domains. First, we show that purification improves performance in random 4x4 matrix games using random 3x3 abstractions. We observe that whether or not purification helps in this setting depends crucially on the support of the equilibrium in the full game, and we precisely specify the supports for which purification helps. Next we consider a simplifed version of poker called Leduc Hold'em; again we show that purification leads to a significant performance improvement over the standard approach, and furthermore that whenever thresholding improves a strategy, the biggest improvement is often achieved using full purification. Finally, we consider actual strategies that used our algorithms in the 2010 AAAI Computer Poker Competition. One of our programs, which uses purification, won the two-player no-limit Texas Hold'em bankroll division. Furthermore, experiments in two-player limit Texas Hold'em show that these performance gains do not necessarily come at the expense of worst-case exploitability and that our algorithms can actually produce strategies with lower exploitabilities than the standard approach.

abstraction, artificial intelligence, game theory, (18 more...)

AAAI Conferences

Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas (0.47)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Poker (0.69)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Games > Poker (0.71)

Add feedback

Monte Carlo Sampling for Regret Minimization in Extensive Games

Lanctot, Marc, Waugh, Kevin, Zinkevich, Martin, Bowling, Michael

Neural Information Processing SystemsDec-31-2009

Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: {\it outcome sampling} and {\it external sampling}, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFRs bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games.

artificial intelligence, game theory, iteration, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.29)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback