Human-AI Coordination via Human-Regularized Search and Learning
Hu, Hengyuan, Wu, David J, Lerer, Adam, Foerster, Jakob, Brown, Noam
–arXiv.org Artificial Intelligence
We consider the problem of making AI agents that collaborate well with humans in partially observable fully cooperative environments given datasets of human behavior. Inspired by piKL, a human-data-regularized search method that improves upon a behavioral cloning policy without diverging far away from it, we develop a three-step algorithm that achieve strong performance in coordinating with real humans in the Hanabi benchmark. We first use a regularized search algorithm and behavioral cloning to produce a better human model that captures diverse skill levels. Then, we integrate the policy regularization idea into reinforcement learning to train a human-like best response to the human model. Finally, we apply regularized search on top of the best response policy at test time to handle outof-distribution challenges when playing with humans. We evaluate our method in two large scale experiments with humans. First, we show that our method outperforms experts when playing with a group of diverse human players in ad-hoc teams. Second, we show that our method beats a vanilla best response to behavioral cloning baseline by having experts play repeatedly with the two agents. One of the most fundamental goals of artificial intelligence research, especially multi-agent research, is to produce agents that can successfully collaborate with humans to achieve common goals. Although search and reinforcement learning (RL) from scratch without human knowledge have achieved impressive superhuman performance in competitive games (Silver et al., 2017; Brown & Sandholm, 2019), prior works (Hu et al., 2020; Carroll et al., 2019) have shown that agents produced by vanilla multi-agent reinforcement learning do not collaborate well with humans.
arXiv.org Artificial Intelligence
Oct-10-2022
- Country:
- North America > United States
- Louisiana > Orleans Parish > New Orleans (0.04)
- Europe
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- United Kingdom > England
- North America > United States
- Genre:
- Research Report (0.83)
- Industry:
- Leisure & Entertainment > Games (1.00)