AITopics | Dennis, Michael

Collaborating Authors

Dennis, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Agent Risks from Advanced AI

Hammond, Lewis, Chan, Alan, Clifton, Jesse, Hoelscher-Obermaier, Jason, Khan, Akbir, McLean, Euan, Smith, Chandler, Barfuss, Wolfram, Foerster, Jakob, Gavenčiak, Tomáš, Han, The Anh, Hughes, Edward, Kovařík, Vojtěch, Kulveit, Jan, Leibo, Joel Z., Oesterheld, Caspar, de Witt, Christian Schroeder, Shah, Nisarg, Wellman, Michael, Bova, Paolo, Cimpeanu, Theodor, Ezell, Carson, Feuillade-Montixi, Quentin, Franklin, Matija, Kran, Esben, Krawczuk, Igor, Lamparth, Max, Lauffer, Niklas, Meinke, Alexander, Motwani, Sumeet, Reuel, Anka, Conitzer, Vincent, Dennis, Michael, Gabriel, Iason, Gleave, Adam, Hadfield, Gillian, Haghtalab, Nika, Kasirzadeh, Atoosa, Krier, Sébastien, Larson, Kate, Lehman, Joel, Parkes, David C., Piliouras, Georgios, Rahwan, Iyad

arXiv.org Artificial IntelligenceFeb-19-2025

The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, as well as seven key risk factors (information asymmetries, network effects, selection pressures, destabilising dynamics, commitment problems, emergent agency, and multi-agent security) that can underpin them. We highlight several important instances of each risk, as well as promising directions to help mitigate them. By anchoring our analysis in a range of real-world examples and experimental evidence, we illustrate the distinct challenges posed by multi-agent systems and their implications for the safety, governance, and ethics of advanced AI.

artificial intelligence, machine learning, survey article, (21 more...)

arXiv.org Artificial Intelligence

2502.14143

Country:

Asia (1.00)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > Los Angeles County (0.13)
North America > Canada > Ontario > Toronto (0.13)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.92)
Research Report > New Finding (0.92)

Industry:

Leisure & Entertainment > Games (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

The Benefits of Power Regularization in Cooperative Reinforcement Learning

Li, Michelle, Dennis, Michael

arXiv.org Artificial IntelligenceJun-17-2024

Cooperative Multi-Agent Reinforcement Learning (MARL) algorithms, trained only to optimize task reward, can lead to a concentration of power where the failure or adversarial intent of a single agent could decimate the reward of every agent in the system. In the context of teams of people, it is often useful to explicitly consider how power is distributed to ensure no person becomes a single point of failure. Here, we argue that explicitly regularizing the concentration of power in cooperative RL systems can result in systems which are more robust to single agent failure, adversarial attacks, and incentive changes of co-players. To this end, we define a practical pairwise measure of power that captures the ability of any co-player to influence the ego agent's reward, and then propose a power-regularized objective which balances task reward and power concentration. Given this new objective, we show that there always exists an equilibrium where every agent is playing a power-regularized best-response balancing power and task reward. Moreover, we present two algorithms for training agents towards this power-regularized objective: Sample Based Power Regularization (SBPR), which injects adversarial data during training; and Power Regularization via Intrinsic Motivation (PRIM), which adds an intrinsic motivation to regulate power to the training objective. Our experiments demonstrate that both algorithms successfully balance task reward and power, leading to lower power behavior than the baseline of task-only reward and avoid catastrophic events in case an agent in the system goes off-policy.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2406.1124

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)

Genre:

Research Report (0.50)
Instructional Material (0.34)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Add feedback

Refining Minimax Regret for Unsupervised Environment Design

Beukman, Michael, Coward, Samuel, Matthews, Michael, Fellows, Mattie, Jiang, Minqi, Dennis, Michael, Foerster, Jakob

arXiv.org Artificial IntelligenceJun-8-2024

In unsupervised environment design, reinforcement learning agents are trained on environment configurations (levels) generated by an adversary that maximises some objective. Regret is a commonly used objective that theoretically results in a minimax regret (MMR) policy with desirable robustness guarantees; in particular, the agent's maximum regret is bounded. However, once the agent reaches this regret bound on all levels, the adversary will only sample levels where regret cannot be further reduced. Although there are possible performance improvements to be made outside of these regret-maximising levels, learning stagnates. In this work, we introduce Bayesian level-perfect MMR (BLP), a refinement of the minimax regret objective that overcomes this limitation. We formally show that solving for this objective results in a subset of MMR policies, and that BLP policies act consistently with a Perfect Bayesian policy over all levels. We further introduce an algorithm, ReMiDi, that results in a BLP policy at convergence. We empirically demonstrate that training on levels from a minimax regret adversary causes learning to prematurely stagnate, but that ReMiDi continues learning.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2402.12284

Country:

North America > United States (0.93)
North America > Canada > British Columbia (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Austria > Vienna (0.14)

Genre:

Research Report (0.50)
Instructional Material (0.46)

Industry: Education > Educational Setting > Continuing Education (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Open-Endedness is Essential for Artificial Superhuman Intelligence

Hughes, Edward, Dennis, Michael, Parker-Holder, Jack, Behbahani, Feryal, Mavalankar, Aditi, Shi, Yuge, Schaul, Tom, Rocktaschel, Tim

arXiv.org Artificial IntelligenceJun-6-2024

In recent years there has been a tremendous surge in the general capabilities of AI systems, mainly fuelled by training foundation models on internetscale data. Nevertheless, the creation of openended, ever self-improving AI remains elusive. In this position paper, we argue that the ingredients are now in place to achieve openendedness in AI systems with respect to a human observer. Furthermore, we claim that such open-endedness is an essential property of any artificial superhuman intelligence (ASI). We begin by providing a concrete formal definition of open-endedness through the lens of novelty and learnability. We then illustrate a path towards ASI via open-ended systems built on top of foundation models, capable of making novel, humanrelevant discoveries. We conclude by examining the safety implications of generally-capable openended AI. We expect that open-ended foundation models will prove to be an increasingly fertile and safety-critical area of research in the near future.

evolutionary algorithm, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2406.04268

Country:

North America > United States > Maryland (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.93)
Information Technology (0.68)
Leisure & Entertainment > Games > Computer Games (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(3 more...)

Add feedback

Genie: Generative Interactive Environments

Bruce, Jake, Dennis, Michael, Edwards, Ashley, Parker-Holder, Jack, Shi, Yuge, Hughes, Edward, Lai, Matthew, Mavalankar, Aditi, Steigerwald, Richie, Apps, Chris, Aytar, Yusuf, Bechtle, Sarah, Behbahani, Feryal, Chan, Stephanie, Heess, Nicolas, Gonzalez, Lucy, Osindero, Simon, Ozair, Sherjil, Reed, Scott, Zhang, Jingwei, Zolna, Konrad, Clune, Jeff, de Freitas, Nando, Singh, Satinder, Rocktäschel, Tim

arXiv.org Artificial IntelligenceFeb-23-2024

We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.15391

Country:

North America > United States (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

minimax: Efficient Baselines for Autocurricula in JAX

Jiang, Minqi, Dennis, Michael, Grefenstette, Edward, Rocktäschel, Tim

arXiv.org Artificial IntelligenceNov-23-2023

Unsupervised environment design (UED) is a form of automatic curriculum learning for training robust decision-making agents to zero-shot transfer into unseen environments. Such autocurricula have received much interest from the RL community. However, UED experiments, based on CPU rollouts and GPU model updates, have often required several weeks of training. This compute requirement is a major obstacle to rapid innovation for the field. This work introduces the minimax library for UED training on accelerated hardware. Using JAX to implement fully-tensorized environments and autocurriculum algorithms, minimax allows the entire training loop to be compiled for hardware acceleration. To provide a petri dish for rapid experimentation, minimax includes a tensorized grid-world based on MiniGrid, in addition to reusable abstractions for conducting autocurricula in procedurally-generated environments. With these components, minimax provides strong UED baselines, including new parallelized variants, which achieve over 120$\times$ speedups in wall time compared to previous implementations when training with equal batch sizes. The minimax library is available under the Apache 2.0 license at https://github.com/facebookresearch/minimax.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2311.12716

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Evolving Curricula with Regret-Based Environment Design

Parker-Holder, Jack, Jiang, Minqi, Dennis, Michael, Samvelyan, Mikayel, Foerster, Jakob, Grefenstette, Edward, Rocktäschel, Tim

arXiv.org Artificial IntelligenceSep-30-2023

It remains a significant challenge to train generally capable agents with reinforcement learning (RL). A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the student agent's capabilities. These methods benefit from their generality, with theoretical guarantees at equilibrium, yet they often struggle to find effective levels in challenging design spaces. By contrast, evolutionary approaches seek to incrementally alter environment complexity, resulting in potentially open-ended learning, but often rely on domain-specific heuristics and vast amounts of computational resources. In this paper we propose to harness the power of evolution in a principled, regret-based curriculum. Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical benefits of prior regret-based methods, while providing significant empirical gains in a diverse set of environments. An interactive version of the paper is available at accelagent.github.io.

evolutionary algorithm, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2203.01302

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (0.92)
Education (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Add feedback

Stabilizing Unsupervised Environment Design with a Learned Adversary

Mediratta, Ishita, Jiang, Minqi, Parker-Holder, Jack, Dennis, Michael, Vinitsky, Eugene, Rocktäschel, Tim

arXiv.org Artificial IntelligenceAug-22-2023

A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of Unsupervised Environment Design (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent's current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on curation and mutation rather than generation of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose solutions for each shortcoming. As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment. We believe this work motivates a renewed emphasis on UED methods based on learned models that directly generate challenging environments, potentially unlocking more open-ended RL training and, as a result, more general agents.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2308.10797

Country:

Europe (1.00)
Asia (0.68)
North America > United States (0.67)

Genre: Research Report > Promising Solution (1.00)

Industry:

Education (1.00)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Who Needs to Know? Minimal Knowledge for Optimal Coordination

Lauffer, Niklas, Shah, Ameesh, Carroll, Micah, Dennis, Michael, Russell, Stuart

arXiv.org Artificial IntelligenceJul-13-2023

If much of the information is irrelevant, it's easy to To optimally coordinate with others in cooperative imagine how this could lead to significant increases in efficiency games, it is often crucial to have information for finding optimal policies. For example, this could about one's collaborators: successful driving requires allow a focused effort on few-shot or zero-shot adaptation to understanding which side of the road to co-players (Zand et al., 2022; Albrecht & Stone, 2017; Stone drive on. However, not every feature of collaborators et al., 2010; Hu et al., 2020) or more efficient DecPOMDP is strategically relevant: the fine-grained planning algorithms (Szer & Charpillet, 2006; Seuken & acceleration of drivers may be ignored while maintaining Zilberstein, 2007). In order to leverage these benefits, we optimal coordination. We show that there build the theory, data structures, and algorithms required to is a well-defined dichotomy between strategically distinguish between relevant and irrelevant information.

artificial intelligence, game theory, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.09309

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Game Theory (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning

Samvelyan, Mikayel, Khan, Akbir, Dennis, Michael, Jiang, Minqi, Parker-Holder, Jack, Foerster, Jakob, Raileanu, Roberta, Rocktäschel, Tim

arXiv.org Artificial IntelligenceMar-6-2023

Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents. Existing methods adapt curricula independently over either environment parameters (in single-agent settings) or co-player policies (in multi-agent settings). However, the strengths and weaknesses of co-players can manifest themselves differently depending on environmental features. It is thus crucial to consider the dependency between the environment and co-player when shaping a curriculum in multi-agent domains. In this work, we use this insight and extend Unsupervised Environment Design (UED) to multi-agent environments. We then introduce Multi-Agent Environment Design Strategist for Open-Ended Learning (MAESTRO), the first multi-agent UED approach for two-player zero-sum settings. MAESTRO efficiently produces adversarial, joint curricula over both environments and co-players and attains minimax-regret guarantees at Nash equilibrium. Our experiments show that MAESTRO outperforms a number of strong baselines on competitive two-player games, spanning discrete and continuous control settings.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2303.03376

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.46)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Sports > Motorsports > Formula One (1.00)
Leisure & Entertainment > Games (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback