Human-AI Coordination via Human-Regularized Search and Learning

Hu, Hengyuan, Wu, David J, Lerer, Adam, Foerster, Jakob, Brown, Noam

Oct-10-2022–arXiv.org Artificial Intelligence

We consider the problem of making AI agents that collaborate well with humans in partially observable fully cooperative environments given datasets of human behavior. Inspired by piKL, a human-data-regularized search method that improves upon a behavioral cloning policy without diverging far away from it, we develop a three-step algorithm that achieve strong performance in coordinating with real humans in the Hanabi benchmark. We first use a regularized search algorithm and behavioral cloning to produce a better human model that captures diverse skill levels. Then, we integrate the policy regularization idea into reinforcement learning to train a human-like best response to the human model. Finally, we apply regularized search on top of the best response policy at test time to handle outof-distribution challenges when playing with humans. We evaluate our method in two large scale experiments with humans. First, we show that our method outperforms experts when playing with a group of diverse human players in ad-hoc teams. Second, we show that our method beats a vanilla best response to behavioral cloning baseline by having experts play repeatedly with the two agents. One of the most fundamental goals of artificial intelligence research, especially multi-agent research, is to produce agents that can successfully collaborate with humans to achieve common goals. Although search and reinforcement learning (RL) from scratch without human knowledge have achieved impressive superhuman performance in competitive games (Silver et al., 2017; Brown & Sandholm, 2019), prior works (Hu et al., 2020; Carroll et al., 2019) have shown that agents produced by vanilla multi-agent reinforcement learning do not collaborate well with humans.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

Oct-10-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Louisiana > Orleans Parish > New Orleans (0.04)
- Europe
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)

Genre:
- Research Report (0.83)

Industry:
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found