AITopics | Schmid, Martin

Collaborating Authors

Schmid, Martin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning to Beat ByteRL: Exploitability of Collectible Card Game Agents

Haluska, Radovan, Schmid, Martin

arXiv.org Artificial IntelligenceApr-25-2024

The goal of the game is to decrease While Poker, as a family of games, has been studied extensively in the opponent's health to zero. There are many popular collectible the last decades, collectible card games have seen relatively little card games, such as Magic: The Gathering [24], Hearthstone [3], attention. Only recently have we seen an agent that can compete The Elder Scrolls: Legends [20] and many others. A trait that makes with professional human players in Hearthstone, one of the most collectible card games appealing to human players and challenging popular collectible card games. Although artificial agents must be for AI agents is the broad range of ways to mix and match available able to work with imperfect information in both of these genres, cards into decks. Even small collectible card games with tens of collectible card games pose another set of distinct challenges. Unlike available cards can offer more potential decks than the total number in many poker variants, agents must deal with state space so vast of atoms in the universe [17].

evolutionary algorithm, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2404.16689

Country:

Europe > Czechia (0.14)
Asia (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(4 more...)

Add feedback

Learning not to Regret

Sychrovsky, David, Sustr, Michal, Davoodi, Elnaz, Lanctot, Marc, Schmid, Martin

arXiv.org Artificial IntelligenceMar-2-2023

Regret minimization is a key component of many algorithms for finding Nash equilibria in imperfect-information games. To scale to games that cannot fit in memory, we can use search with value functions. However, calling the value functions repeatedly in search can be expensive. Therefore, it is desirable to minimize regret in the search tree as fast as possible. We propose to accelerate the regret minimization by introducing a general ``learning not to regret'' framework, where we meta-learn the regret minimizer. The resulting algorithm is guaranteed to minimize regret in arbitrary settings and is (meta)-learned to converge fast on a selected distribution of games. Our experiments show that meta-learned algorithms converge substantially faster than prior regret minimization algorithms.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.01074

Country:

Europe (0.29)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
(2 more...)

Add feedback

Player of Games

Schmid, Martin, Moravcik, Matej, Burch, Neil, Kadlec, Rudolf, Davidson, Josh, Waugh, Kevin, Bard, Nolan, Timbers, Finbarr, Lanctot, Marc, Holland, Zach, Davoodi, Elnaz, Christianson, Alden, Bowling, Michael

arXiv.org Artificial IntelligenceDec-6-2021

Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.

artificial intelligence, machine learning, reinforcement learning, (23 more...)

arXiv.org Artificial Intelligence

2112.03178

Country:

Europe > United Kingdom > Scotland (0.26)
North America > United States > Texas (0.25)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Games > Chess (0.88)
Leisure & Entertainment > Games > Poker (0.88)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(4 more...)

Add feedback

Search in Imperfect Information Games

Schmid, Martin

arXiv.org Artificial IntelligenceNov-10-2021

From the very dawn of the field, search with value functions was a fundamental concept of computer games research. Turing's chess algorithm from 1950 was able to think two moves ahead, and Shannon's work on chess from $1950$ includes an extensive section on evaluation functions to be used within a search. Samuel's checkers program from 1959 already combines search and value functions that are learned through self-play and bootstrapping. TD-Gammon improves upon those ideas and uses neural networks to learn those complex value functions -- only to be again used within search. The combination of decision-time search and value functions has been present in the remarkable milestones where computers bested their human counterparts in long standing challenging games -- DeepBlue for Chess and AlphaGo for Go. Until recently, this powerful framework of search aided with (learned) value functions has been limited to perfect information games. As many interesting problems do not provide the agent perfect information of the environment, this was an unfortunate limitation. This thesis introduces the reader to sound search for imperfect information games.

artificial intelligence, machine learning, reinforcement learning, (25 more...)

arXiv.org Artificial Intelligence

2111.05884

Country:

North America > United States (1.00)
North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (0.92)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(5 more...)

Add feedback

Solving Common-Payoff Games with Approximate Policy Iteration

Sokota, Samuel, Lockhart, Edward, Timbers, Finbarr, Davoodi, Elnaz, D'Orazio, Ryan, Burch, Neil, Schmid, Martin, Bowling, Michael, Lanctot, Marc

arXiv.org Artificial IntelligenceJan-11-2021

For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult -- computing even an epsilon-optimal joint policy is a NEXP complete problem. Nevertheless, a recently rediscovered insight -- that a team of agents can coordinate via common knowledge -- has given rise to algorithms capable of finding optimal joint policies in small common-payoff games. The Bayesian action decoder (BAD) leverages this insight and deep reinforcement learning to scale to games as large as two-player Hanabi. However, the approximations it uses to do so prevent it from discovering optimal joint policies even in games small enough to brute force optimal solutions. This work proposes CAPI, a novel algorithm which, like BAD, combines common knowledge with deep reinforcement learning. However, unlike BAD, CAPI prioritizes the propensity to discover optimal joint policies over scalability. While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so. Code is available at https://github.com/ssokota/capi .

artificial intelligence, prescription vector, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2101.04237

Country:

North America > United States > California (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Approximate exploitability: Learning a best response in large games

Timbers, Finbarr, Lockhart, Edward, Lanctot, Marc, Schmid, Martin, Schrittwieser, Julian, Hubert, Thomas, Bowling, Michael

arXiv.org Machine LearningSep-22-2020

A standard metric used to measure the approximate optimality of policies in imperfect information games is exploitability, i.e. the performance of a policy against its worst-case opponent. However, exploitability is intractable to compute in large games as it requires a full traversal of the game tree to calculate a best response to the given policy. We introduce a new metric, approximate exploitability, that calculates an analogous metric using an approximate best response; the approximation is done by using search and reinforcement learning. This is a generalization of local best response, a domain specific evaluation metric used in poker. We provide empirical results for a specific instance of the method, demonstrating that our method converges to exploitability in the tabular and function approximation settings for small games. In large games, our method learns to exploit both strong and weak agents, learning to exploit an AlphaZero agent.

agent, artificial intelligence, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2004.09677

Country: North America (0.47)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Chess (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (1.00)

Add feedback

The Advantage Regret-Matching Actor-Critic

Gruslys, Audrūnas, Lanctot, Marc, Munos, Rémi, Timbers, Finbarr, Schmid, Martin, Perolat, Julien, Morrill, Dustin, Zambaldi, Vinicius, Lespiau, Jean-Baptiste, Schultz, John, Azar, Mohammad Gheshlaghi, Bowling, Michael, Tuyls, Karl

arXiv.org Artificial IntelligenceAug-27-2020

Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL). In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior. We propose a model-free RL algorithm, the AdvantageRegret-Matching Actor-Critic (ARMAC): rather than saving past state-action data, ARMAC saves a buffer of past policies, replaying through them to reconstruct hindsight assessments of past behavior. These retrospective value estimates are used to predict conditional advantages which, combined with regret matching, produces a new policy. In particular, ARMAC learns from sampled trajectories in a centralized training setting, without requiring the application of importance sampling commonly used in Monte Carlo counterfactual regret (CFR) minimization; hence, it does not suffer from excessive variance in large environments. In the single-agent setting, ARMAC shows an interesting form of exploration by keeping past policies intact. In the multiagent setting, ARMAC in self-play approaches Nash equilibria on some partially-observable zero-sum benchmarks. We provide exploitability estimates in the significantly larger game of betting-abstracted no-limit Texas Hold'em.

algorithm, computer game, deep learning, (19 more...)

arXiv.org Artificial Intelligence

2008.12234

Country: North America > United States > Texas (0.25)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Low-Variance and Zero-Variance Baselines for Extensive-Form Games

Davis, Trevor, Schmid, Martin, Bowling, Michael

arXiv.org Artificial IntelligenceJul-22-2019

Extensive-form games (EFGs) are a common model of multi-agent interactions with imperfect information. State-of-the-art algorithms for solving these games typically perform full walks of the game tree that can prove prohibitively slow in large games. Alternatively, sampling-based methods such as Monte Carlo Counterfactual Regret Minimization walk one or more trajectories through the tree, touching only a fraction of the nodes on each iteration, at the expense of requiring more iterations to converge due to the variance of sampled values. In this paper, we extend recent work that uses baseline estimates to reduce this variance. We introduce a framework of baseline-corrected values in EFGs that generalizes the previous work. Within our framework, we propose new baseline functions that result in significantly reduced variance compared to existing techniques. We show that one particular choice of such a function --- predictive baseline --- is provably optimal under certain sampling schemes. This allows for efficient computation of zero-variance value estimates even along sampled trajectories.

artificial intelligence, game theory, null, (19 more...)

arXiv.org Artificial Intelligence

1907.09633

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rethinking Formal Models of Partially Observable Multiagent Decision Making

Kovařík, Vojtěch, Schmid, Martin, Burch, Neil, Bowling, Michael, Lisý, Viliam

arXiv.org Artificial IntelligenceJun-26-2019

Multiagent decision-making problems in partially observable environments are usually modeled as either extensive-form games (EFGs) within the game theory community or partially observable stochastic games (POSGs) within the reinforcement learning community. While most practical problems can be modeled in both formalisms, the communities using these models are mostly distinct with little sharing of ideas or advances. The last decade has seen dramatic progress in algorithms for EFGs, mainly driven by the challenge problem of poker. We have seen computational techniques achieving super-human performance, some variants of poker are essentially solved, and there are now sound local search algorithms which were previously thought impossible. While the advances have garnered attention, the fundamental advances are not yet understood outside the EFG community. This can be largely explained by the starkly different formalisms between the game theory and reinforcement learning communities and, further, by the unsuitability of the original EFG formalism to make the ideas simple and clear. This paper aims to address these hindrances, by advocating a new unifying formalism, a variant of POSGs, which we call Factored-Observation Games (FOGs). We prove that any timeable perfect-recall EFG can be efficiently modeled as a FOG as well as relating FOGs to other existing formalisms. Additionally, a FOG explicitly identifies the public and private components of observations, which is fundamental to the recent EFG breakthroughs. We conclude by presenting the two building-blocks of these breakthroughs --- counterfactual regret minimization and public state decomposition --- in the new formalism, illustrating our goal of a simpler path for sharing recent advances between game theory and reinforcement learning community.

computer game, game theory, information, (23 more...)

arXiv.org Artificial Intelligence

1906.1111

Country:

North America > United States (0.14)
Europe > Netherlands (0.14)
North America > Canada (0.14)
Europe > Czechia (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Revisiting CFR+ and Alternating Updates

Burch, Neil, Moravcik, Matej, Schmid, Martin

Journal of Artificial Intelligence ResearchFeb-25-2019

The CFR+ algorithm for solving imperfect information games is a variant of the popular CFR algorithm, with faster empirical performance on a range of problems. It was introduced with a theoretical upper bound on solution error, but subsequent work showed an error in one step of the proof. We provide updated proofs to recover the original bound.

artificial intelligence, exploitability, machine learning, (19 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11370

AI Access Foundation

11370

Journal of Artificial Intelligence Research

Country: North America > United States (0.14)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Game Theory (0.89)

Add feedback