AITopics | Kaisers, Michael

Collaborating Authors

Kaisers, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mastering Board Games by External and Internal Planning with Language Models

Schultz, John, Adamek, Jakub, Jusup, Matej, Lanctot, Marc, Kaisers, Michael, Perrin, Sarah, Hennes, Daniel, Shar, Jeremy, Lewis, Cannada, Ruoss, Anian, Zahavy, Tom, Veličković, Petar, Prince, Laurel, Singh, Satinder, Malmi, Eric, Tomašev, Nenad

arXiv.org Artificial IntelligenceDec-2-2024

While large language models perform well on a range of complex tasks (e.g., text generation, question answering, summarization), robust multi-step planning and reasoning remains a considerable challenge for them. In this paper we show that search-based planning can significantly improve LLMs' playing strength across several board games (Chess, Fischer Random / Chess960, Connect Four, and Hex). We introduce, compare and contrast two major approaches: In external search, the model guides Monte Carlo Tree Search (MCTS) rollouts and evaluations without calls to an external engine, and in internal search, the model directly generates in-context a linearized tree of potential futures and a resulting final choice. Both build on a language model pre-trained on relevant domain knowledge, capturing the transition and value functions across these games. We find that our pre-training method minimizes hallucinations, as our model is highly accurate regarding state prediction and legal moves. Additionally, both internal and external search indeed improve win-rates against state-of-the-art bots, even reaching Grandmaster-level performance in chess while operating on a similar move count search budget per decision as human Grandmasters. The way we combine search with domain knowledge is not specific to board games, suggesting direct extensions into more general language model inference and training techniques.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.12119

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Soft Condorcet Optimization for Ranking of General Agents

Lanctot, Marc, Larson, Kate, Kaisers, Michael, Berthet, Quentin, Gemp, Ian, Diaz, Manfred, Maura-Rivero, Roberto-Rafael, Bachrach, Yoram, Koop, Anna, Precup, Doina

arXiv.org Artificial IntelligenceNov-4-2024

A common way to drive progress of AI models and agents is to compare their performance on standardized benchmarks. Comparing the performance of general agents requires aggregating their individual performances across a potentially wide variety of different tasks. In this paper, we describe a novel ranking scheme inspired by social choice frameworks, called Soft Condorcet Optimization (SCO), to compute the optimal ranking of agents: the one that makes the fewest mistakes in predicting the agent comparisons in the evaluation data. This optimal ranking is the maximum likelihood estimate when evaluation data (which we view as votes) are interpreted as noisy samples from a ground truth ranking, a solution to Condorcet's original voting system criteria. SCO ratings are maximal for Condorcet winners when they exist, which we show is not necessarily true for the classical rating system Elo. We propose three optimization algorithms to compute SCO ratings and evaluate their empirical performance. When serving as an approximation to the Kemeny-Young voting method, SCO rankings are on average 0 to 0.043 away from the optimal ranking in normalized Kendall-tau distance across 865 preference profiles from the PrefLib open ranking archive. In a simulated noisy tournament setting, SCO achieves accurate approximations to the ground truth ranking and the best among several baselines when 59\% or more of the preference data is missing. Finally, SCO ranking provides the best approximation to the optimal ranking, measured on held-out test sets, in a problem containing 52,958 human players across 31,049 games of the classic seven-player game of Diplomacy.

machine learning, natural language, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2411.00119

Country:

North America > United States > New York (0.14)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Leisure & Entertainment > Games > Chess (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(5 more...)

Add feedback

Approximating the Core via Iterative Coalition Sampling

Gemp, Ian, Lanctot, Marc, Marris, Luke, Mao, Yiran, Duéñez-Guzmán, Edgar, Perrin, Sarah, Gyorgy, Andras, Elie, Romuald, Piliouras, Georgios, Kaisers, Michael, Hennes, Daniel, Bullard, Kalesha, Larson, Kate, Bachrach, Yoram

arXiv.org Artificial IntelligenceFeb-6-2024

The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, and to fully embrace cooperative game theory contributions in domains such as explainable AI (XAI), where the core can complement the Shapley values to identify influential features or instances supporting predictions by black-box models. We propose novel iterative algorithms for computing variants of the core, which avoid the computational bottleneck of many other approaches; namely solving large linear programs. As such, they scale better to very large problems as we demonstrate across different classes of cooperative games, including weighted voting games, induced subgraph games, and marginal contribution networks. We also explore our algorithms in the context of XAI, providing further evidence of the power of the core for such applications.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2402.03928

Country:

North America > United States (0.46)
North America > Canada (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

TacticAI: an AI assistant for football tactics

Wang, Zhe, Veličković, Petar, Hennes, Daniel, Tomašev, Nenad, Prince, Laurel, Kaisers, Michael, Bachrach, Yoram, Elie, Romuald, Wenliang, Li Kevin, Piccinini, Federico, Spearman, William, Graham, Ian, Connor, Jerome, Yang, Yi, Recasens, Adrià, Khan, Mina, Beauguerlange, Nathalie, Sprechmann, Pablo, Moreno, Pol, Heess, Nicolas, Bowling, Michael, Hassabis, Demis, Tuyls, Karl

arXiv.org Machine LearningOct-17-2023

Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing corner kicks, as they offer coaches the most direct opportunities for interventions and improvements. TacticAI incorporates both a predictive and a generative component, allowing the coaches to effectively sample and explore alternative player setups for each corner kick routine and to select those with the highest predicted likelihood of success. We validate TacticAI on a number of relevant benchmark tasks: predicting receivers and shot attempts and recommending player position adjustments. The utility of TacticAI is validated by a qualitative study conducted with football domain experts at Liverpool FC. We show that TacticAI's model suggestions are not only indistinguishable from real tactics, but also favoured over existing tactics 90% of the time, and that TacticAI offers an effective corner kick retrieval system. TacticAI achieves these results despite the limited availability of gold-standard data, achieving data efficiency through geometric deep learning.

data mining, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2310.10553

Country:

Europe (0.46)
North America > Canada > Alberta (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.46)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

BRExIt: On Opponent Modelling in Expert Iteration

Hernandez, Daniel, Baier, Hendrik, Kaisers, Michael

arXiv.org Artificial IntelligenceApr-25-2023

Finding a best response policy is a central objective in game theory and multi-agent learning, with modern population-based training approaches employing reinforcement learning algorithms as best-response oracles to improve play against candidate opponents (typically previously learnt policies). We propose Best Response Expert Iteration (BRExIt), which accelerates learning in games by incorporating opponent models into the state-of-the-art learning algorithm Expert Iteration (ExIt). BRExIt aims to (1) improve feature shaping in the apprentice, with a policy head predicting opponent policies as an auxiliary task, and (2) bias opponent moves in planning towards the given or learnt opponent model, to generate apprentice targets that better approximate a best response. In an empirical ablation on BRExIt's algorithmic variants against a set of fixed test agents, we provide statistical evidence that BRExIt learns better performing policies than ExIt.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2206.00113

Country: Europe > United Kingdom (1.00)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Games (1.00)
Government > Regional Government > Europe Government > United Kingdom Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Online Planning in POMDPs with Self-Improving Simulators

He, Jinke, Suau, Miguel, Baier, Hendrik, Kaisers, Michael, Oliehoek, Frans A.

arXiv.org Artificial IntelligenceJan-27-2022

How can we plan efficiently in a large and complex environment when the time budget is limited? However, there are three main limitations of this "twophase" Given the original simulator of the environment, paradigm, where a simulator is learned offline and which may be computationally very demanding, we then used as-is for online simulation and planning. First, no propose to learn online an approximate but much planning is possible until the offline learning phase finishes, faster simulator that improves over time. To plan which can take a long time. Second, the separation of learning reliably and efficiently while the approximate simulator and planning raises a question on what data collection policy is learning, we develop a method that adaptively should be used during training to ensure good online prediction decides which simulator to use for every simulation, during planning. We empirically demonstrate that when based on a statistic that measures the accuracy the training data is collected by a uniform random policy, the of the approximate simulator. This allows us to learned influence predictors can perform poorly during online use the approximate simulator to replace the original planning, due to distribution shift. Third, completely replacing simulator for faster simulations when it is accurate the original simulator with the approximate one after enough under the current context, thus trading training implies a risk of poor planning performance in certain off simulation speed and accuracy. Experimental situations, which is hard to detect in advance.

artificial intelligence, machine learning, simulation, (16 more...)

arXiv.org Artificial Intelligence

2201.11404

Country: Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.83)

Add feedback

Robust temporal difference learning for critical domains

Klima, Richard, Bloembergen, Daan, Kaisers, Michael, Tuyls, Karl

arXiv.org Machine LearningJan-23-2019

We present a new Q-function operator for temporal difference (TD) learning methods that explicitly encodes robustness against significant rare events (SRE) in critical domains. The operator, which we call the $\kappa$-operator, allows to learn a safe policy in a model-based fashion without actually observing the SRE. We introduce single- and multi-agent robust TD methods using the operator $\kappa$. We prove convergence of the operator to the optimal safe Q-function with respect to the model using the theory of Generalized Markov Decision Processes. In addition we prove convergence to the optimal Q-function of the original MDP given that the probability of SREs vanishes. Empirical evaluations demonstrate the superior performance of $\kappa$-based TD methods both in the early learning phase as well as in the final converged stage. In addition we show robustness of the proposed method to small model errors, as well as its applicability in a multi-agent context.

air transportation, artificial intelligence, probability, (18 more...)

arXiv.org Machine Learning

1901.08021

Country: North America > United States > Connecticut (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (0.93)
Transportation > Air (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Energy- and Cost-Efficient Pumping Station Control

Kanters, Timon V. (University of Amsterdam) | Oliehoek, Frans A. (University of Liverpool and University of Amsterdam) | Kaisers, Michael (Centrum Wiskunde and Informatica) | Bosch, Stan R. van den (Nelen and Schuurmans) | Grispen, Joep (Nelen and Schuurmans) | Hermans, Jeroen (Hoogheemraadschap Hollands Noorderkwartier)

AAAI ConferencesApr-19-2016

With renewable energy becoming more common, energy prices fluctuate more depending on environmental factors such as the weather. Consuming energy without taking volatile prices into consideration can not only become expensive, but may also increase the peak load, which requires energy providers to generate additional energy using less environment-friendly methods. In the Netherlands, pumping stations that maintain the water levels of polder canals are large energy consumers, but the controller software currently used in the industry does not take real-time energy availability into account. We investigate if existing AI planning techniques have the potential to improve upon the current solutions. In particular, we propose a light weight but realistic simulator and investigate if an online planning method (UCT) can utilise this simulator to improve the cost-efficiency of pumping station control policies. An empirical comparison with the current control algorithms indicates that substantial cost, and thus peak load, reduction can be attained.

energy conservation, planning & scheduling, water level, (20 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country:

Europe (1.00)
North America > United States > Texas (0.14)
North America > Canada > Ontario (0.14)

Industry:

Energy > Power Industry (0.69)
Energy > Renewable (0.54)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

FAQ-Learning in Matrix Games: Demonstrating Convergence Near Nash Equilibria, and Bifurcation of Attractors in the Battle of Sexes

Kaisers, Michael (Maastricht University) | Tuyls, Karl (Maastricht University)

AAAI ConferencesAug-8-2011

This article studies Frequency Adjusted Q-learning (FAQ-learning), a variation of Q-learning that simulates simultaneous value function updates. The main contributions are empirical and theoretical support for the convergence of FAQ-learning to attractors near Nash equilibria in two-agent two-action matrix games.The games can be divided into three types: Matching pennies, Prisoners' Dilemma and Battle of Sexes. This article shows that the Matching pennies and Prisoners' Dilemma yield one attractor of the learning dynamics, while the Battle of Sexes exhibits a supercritical pitchfork bifurcation at a critical temperature, where one attractor splits into two attractors and one repellent fixed point. Experiments illustrate that the distance between fixed points of the FAQ-learning dynamics and Nash equilibria tends to zero as the exploration parameter of FAQ-learning approaches zero.

game theory, prisoner, survey article, (18 more...)

AAAI Conferences

Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Frequently Asked Questions (FAQ) (1.00)

Industry: Leisure & Entertainment > Games (0.49)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Cognitive Hierarchy Model Applied to the Lemonade Game

Wunder, Michael (Rutgers University) | Littman, Michael (Rutgers University) | Kaisers, Michael (University of Maastricht) | Yaros, John Robert (Rutgers University)

AAAI ConferencesJul-8-2010

One of the challenges of multiagent decision making is that the behavior needed to maximize utility can depend on what other agents choose to do: sometimes there is no "right" answer in the absence of knowledge of how opponents will act. The Nash equilibrium is a sensible choice of behavior because it represents a mutual best response. But, even when there is a unique equilibrium, other players are under no obligation to take part in it. This observation has been forcefully illustrated in the behavioral economics community where repeated experiments have shown individuals playing Nash equilibria and performing badly as a result. In this paper, we show how to apply a tool from behavioral economics called the Cognitive Hierarchy (CH) to the design of agents in general sum games. We attack the recently introduced ``Lemonade Game'' and show how the results of an open competition are well explained by CH. We believe this game, and perhaps many other similar games, boils down to predicting how deeply other agents in the game will be reasoning. An agent that does not reason enough risks being exploited by its opponents, while an agent that reasons too much may not be able to interact productively with its opponents. We demonstrate these ideas by presenting empirical results using agents from the competition and idealizations arising from a CH analysis.

agent, artificial intelligence, game theory, (17 more...)

AAAI Conferences

Workshops at the Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States (0.14)
Europe (0.14)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback