A Generalist Hanabi Agent

Sudhakar, Arjun V, Nekoei, Hadi, Reymond, Mathieu, Liu, Miao, Rajendran, Janarthanan, Chandar, Sarath

Mar-17-2025–arXiv.org Artificial Intelligence

Traditional multi-agent reinforcement learning (MARL) systems can develop cooperative strategies through repeated interactions. However, these systems are unable to perform well on any other setting than the one they have been trained on, and struggle to successfully cooperate with unfamiliar collaborators. This is particularly visible in the Hanabi benchmark, a popular 2-to-5 player cooperative card-game which requires complex reasoning and precise assistance to other agents. Current MARL agents for Hanabi can only learn one specific game-setting (e.g., 2-player games), and play with the same algorithmic agents. This is in stark contrast to humans, who can quickly adjust their strategies to work with unfamiliar partners or situations. In this paper, we introduce Recurrent Replay Relevance Distributed DQN (R3D2), a generalist agent for Hanabi, designed to overcome these limitations. We reformulate the task using text, as language has been shown to improve transfer. We then propose a distributed MARL algorithm that copes with the resulting dynamic observation-and action-space. In doing so, our agent is the first that can play all game settings concurrently, and extend strategies learned from one setting to other ones. As a consequence, our agent also demonstrates the ability to collaborate with different algorithmic agents -- agents that are themselves unable to do so. Humans were able to thrive as a society through their ability to cooperate. Interactions among multiple people or agents are essential components of various aspects of our lives, ranging from everyday activities like commuting to work, to the functioning of fundamental institutions like governments and economic markets. Through repeated interactions, humans can understand their partners, and learn to reason from their perspective. Crucially, humans can generalize their reasonings towards novel partners, in different situations. Artificial agents should be able to do the same for the successful collaboration of artificial and hybrid systems (Dafoe et al., 2020). This is why defining the problem of multi-agent cooperation nicely fits the multi-agent reinforcement learning (MARL) paradigm, as artificial agents learn to collaborate together through repeated interactions, in the same principled manner humans would. In MARL, the game of Hanabi has emerged as a popular benchmark to assess the cooperative abilities of learning agents (Bard et al., 2020).

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Mar-17-2025

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Quebec (0.14)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.46)
    - Neural Networks > Deep Learning (1.00)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning > Agents (1.00)