Goto

Collaborating Authors

 hider



Enabling Multi-Robot Collaboration from Single-Human Guidance

Ji, Zhengran, Zhang, Lingyu, Sajda, Paul, Chen, Boyuan

arXiv.org Artificial Intelligence

The best policy achieves an average seeker success rate of 84.2% in simulation and 80% in real-world experiments in a challenging 3 seekers vs 3 hiders setting with random map layouts. In comparison, the baseline policy has only 36.4% in simulation and 55% in real-world. Interesting collaborative behaviors among seekers are observed during deployment, such as strategically navigating to anticipate and intercept hiders or effectively blocking key paths as a team. Abstract -- Learning collaborative behaviors is essential for multi-agent systems. Traditionally, multi-agent reinforcement learning solves this implicitly through a joint reward and centralized observations, assuming collaborative behavior will emerge. Other studies propose to learn from demonstrations of a group of collaborative experts. Instead, we propose an efficient and explicit way of learning collaborative behaviors in multi-agent systems by leveraging expertise from only a single human. Our insight is that humans can naturally take on various roles in a team. We show that agents can effectively learn to collaborate by allowing a human operator to dynamically switch between controlling agents for a short period and incorporating a human-like theory-of-mind model of teammates. Our experiments showed that our method improves the success rate of a challenging collaborative hide-and-seek task by up to 58 % with only 40 minutes of single-human guidance.


Accelerate Multi-Agent Reinforcement Learning in Zero-Sum Games with Subgame Curriculum Learning

Chen, Jiayu, Xu, Zelai, Li, Yunfei, Yu, Chao, Song, Jiaming, Yang, Huazhong, Fang, Fei, Wang, Yu, Wu, Yi

arXiv.org Artificial Intelligence

Learning Nash equilibrium (NE) in complex zero-sum games with multi-agent reinforcement learning (MARL) can be extremely computationally expensive. Curriculum learning is an effective way to accelerate learning, but an under-explored dimension for generating a curriculum is the difficulty-to-learn of the subgames -- games induced by starting from a specific state. In this work, we present a novel subgame curriculum learning framework for zero-sum games. It adopts an adaptive initial state distribution by resetting agents to some previously visited states where they can quickly learn to improve performance. Building upon this framework, we derive a subgame selection metric that approximates the squared distance to NE values and further adopt a particle-based state sampler for subgame generation. Integrating these techniques leads to our new algorithm, Subgame Automatic Curriculum Learning (SACL), which is a realization of the subgame curriculum learning framework. SACL can be combined with any MARL algorithm such as MAPPO. Experiments in the particle-world environment and Google Research Football environment show SACL produces much stronger policies than baselines. In the challenging hide-and-seek quadrant environment, SACL produces all four emergent stages and uses only half the samples of MAPPO with self-play. The project website is at https://sites.google.com/view/sacl-rl.


Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training

Li, Qian, Hu, Yuxiao, Dong, Yinpeng, Zhang, Dongxiao, Chen, Yuntian

arXiv.org Artificial Intelligence

Adversarial training is often formulated as a min-max problem, however, concentrating only on the worst adversarial examples causes alternating repetitive confusion of the model, i.e., previously defended or correctly classified samples are not defensible or accurately classifiable in subsequent adversarial training. We characterize such non-ignorable samples as "hiders", which reveal the hidden high-risk regions within the secure area obtained through adversarial training and prevent the model from finding the real worst cases. We demand the model to prevent hiders when defending against adversarial examples for improving accuracy and robustness simultaneously. By rethinking and redefining the min-max optimization problem for adversarial training, we propose a generalized adversarial training algorithm called Hider-Focused Adversarial Training (HFAT). HFAT introduces the iterative evolution optimization strategy to simplify the optimization problem and employs an auxiliary model to reveal hiders, effectively combining the optimization directions of standard adversarial training and prevention hiders. Furthermore, we introduce an adaptive weighting mechanism that facilitates the model in adaptively adjusting its focus between adversarial examples and hiders during different training periods. We demonstrate the effectiveness of our method based on extensive experiments, and ensure that HFAT can provide higher robustness and accuracy.


Replication of Multi-agent Reinforcement Learning for the "Hide and Seek" Problem

Kamal, Haider, Niazi, Muaz A., Afzal, Hammad

arXiv.org Artificial Intelligence

Reinforcement learning generates policies based on reward functions and hyperparameters. Slight changes in these can significantly affect results. The lack of documentation and reproducibility in Reinforcement learning research makes it difficult to replicate once-deduced strategies. While previous research has identified strategies using grounded maneuvers, there is limited work in more complex environments. The agents in this study are simulated similarly to Open Al's hider and seek agents, in addition to a flying mechanism, enhancing their mobility, and expanding their range of possible actions and strategies. This added functionality improves the Hider agents to develop a chasing strategy from approximately 2 million steps to 1.6 million steps and hiders


AutoDIME: Automatic Design of Interesting Multi-Agent Environments

Kanitscheider, Ingmar, Edwards, Harri

arXiv.org Machine Learning

Designing a distribution of environments in which RL agents can learn interesting and useful skills is a challenging and poorly understood task, for multi-agent environments the difficulties are only exacerbated. One approach is to train a second RL agent, called a teacher, who samples environments that are conducive for the learning of student agents. However, most previous proposals for teacher rewards do not generalize straightforwardly to the multi-agent setting. We examine a set of intrinsic teacher rewards derived from prediction problems that can be applied in multi-agent settings and evaluate them in Mujoco tasks such as multiagent Hide and Seek [1] as well as a diagnostic single-agent maze task. Of the intrinsic rewards considered we found value disagreement to be most consistent across tasks, leading to faster and more reliable emergence of advanced skills in Hide and Seek and the maze task. Another candidate intrinsic reward considered, value prediction error, also worked well in Hide and Seek but was susceptible to noisy-TV style distractions in stochastic environments. Policy disagreement performed well in the maze task but did not speed up learning in Hide and Seek. Our results suggest that intrinsic teacher rewards, and in particular value disagreement, are a promising approach for automating both single and multi-agent environment design.


Why teaching robots to play hide-and-seek could be the key to next-gen A.I.

#artificialintelligence

Artificial general intelligence, the idea of an intelligent A.I. agent that's able to understand and learn any intellectual task that humans can do, has long been a component of science fiction. As A.I. gets smarter and smarter -- especially with breakthroughs in machine learning tools that are able to rewrite their code to learn from new experiences -- it's increasingly widely a part of real artificial intelligence conversations as well. But how do we measure AGI when it does arrive? Over the years, researchers have laid out a number of possibilities. The most famous remains the Turing Test, in which a human judge interacts, sight unseen, with both humans and a machine, and must try and guess which is which.


A Classical Search Game in Discrete Locations

Clarkson, Jake, Lin, Kyle Y., Glazebrook, Kevin D.

arXiv.org Machine Learning

Consider a two-person zero-sum search game between a hider and a searcher. The hider hides among $n$ discrete locations, and the searcher successively visits individual locations until finding the hider. Known to both players, a search at location $i$ takes $t_i$ time units and detects the hider -- if hidden there -- independently with probability $q_i$, for $i=1,\ldots,n$. The hider aims to maximize the expected time until detection, while the searcher aims to minimize it. We prove the existence of an optimal strategy for each player. In particular, the hider's optimal mixed strategy hides in each location with a nonzero probability, and the searcher's optimal mixed strategy can be constructed with up to $n$ simple search sequences. We develop an algorithm to compute an optimal strategy for each player, and compare the optimal hiding strategy with the simple hiding strategy which gives the searcher no location preference at the beginning of the search.


Why Scientists are Teaching Robots to Play Hide-and-Seek

#artificialintelligence

Artificial general intelligence, the idea of an intelligent A.I. agent that's able to understand and learn any intellectual task that humans can do, has long been a component of science fiction. As A.I. gets smarter and smarter -- especially with breakthroughs in machine learning tools that are able to rewrite their code to learn from new experiences -- it's increasingly widely a part of real artificial intelligence conversations as well. But how do we measure AGI when it does arrive? Over the years, researchers have laid out a number of possibilities. The most famous remains the Turing Test, in which a human judge interacts, sight unseen, with both humans and a machine, and must try and guess which is which.


Hide-and-Seek: A Template for Explainable AI

Tagaris, Thanos, Stafylopatis, Andreas

arXiv.org Artificial Intelligence

Lack of transparency has been the Achilles heal of Neural Networks and their wider adoption in industry. Despite significant interest this shortcoming has not been adequately addressed. This study proposes a novel framework called Hide-and-Seek (HnS) for training Interpretable Neural Networks and establishes a theoretical foundation for exploring and comparing similar ideas. Extensive experimentation indicates that a high degree of interpretability can be imputed into Neural Networks, without sacrificing their predictive power.