Goto

Collaborating Authors

 Agents


Learning Practical Communication Strategies in Cooperative Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

In Multi-Agent Reinforcement Learning, communication is critical to encourage cooperation among agents. Communication in realistic wireless networks can be highly unreliable due to network conditions varying with agents' mobility, and stochasticity in the transmission process. We propose a framework to learn practical communication strategies by addressing three fundamental questions: (1) When: Agents learn the timing of communication based on not only message importance but also wireless channel conditions. (2) What: Agents augment message contents with wireless network measurements to better select the game and communication actions. (3) How: Agents use a novel neural message encoder to preserve all information from received messages, regardless of the number and order of messages. Simulating standard benchmarks under realistic wireless network settings, we show significant improvements in game performance, convergence speed and communication efficiency compared with state-of-the-art.


Congested Urban Networks Tend to Be Insensitive to Signal Settings: Implications for Learning-Based Control

arXiv.org Artificial Intelligence

This paper highlights several properties of large urban networks that can have an impact on machine learning methods applied to traffic signal control. In particular, we show that the average network flow tends to be independent of the signal control policy as density increases. This property, which so far has remained under the radar, implies that deep reinforcement learning (DRL) methods becomes ineffective when trained under congested conditions, and might explain DRL's limited success for traffic signal control. Our results apply to all possible grid networks thanks to a parametrization based on two network parameters: the ratio of the expected distance between consecutive traffic lights to the expected green time, and the turning probability at intersections. Networks with different parameters exhibit very different responses to traffic signal control. Notably, we found that no control (i.e. random policy) can be an effective control strategy for a surprisingly large family of networks. The impact of the turning probability turned out to be very significant both for baseline and for DRL policies. It also explains the loss of symmetry observed for these policies, which is not captured by existing theories that rely on corridor approximations without turns. Our findings also suggest that supervised learning methods have enormous potential as they require very little examples to produce excellent policies.


Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

We present the extension of the Remember and Forget for Experience Replay (ReF-ER) algorithm to Multi-Agent Reinforcement Learning (MARL). ReF-ER was shown to outperform state of the art algorithms for continuous control in problems ranging from the OpenAI Gym to complex fluid flows. In MARL, the dependencies between the agents are included in the state-value estimator and the environment dynamics are modeled via the importance weights used by ReF-ER. In collaborative environments, we find the best performance when the value is estimated using individual rewards and we ignore the effects of other actions on the transition map. We benchmark the performance of ReF-ER MARL on the Stanford Intelligent Systems Laboratory (SISL) environments. We find that employing a single feed-forward neural network for the policy and the value function in ReF-ER MARL, outperforms state of the art algorithms that rely on complex neural network architectures.


Understandable Controller Extraction from Video Observations of Swarms

arXiv.org Artificial Intelligence

Swarm behavior emerges from the local interaction of agents and their environment often encoded as simple rules. Extracting the rules by watching a video of the overall swarm behavior could help us study and control swarm behavior in nature, or artificial swarms that have been designed by external actors. It could also serve as a new source of inspiration for swarm robotics. Yet extracting such rules is challenging as there is often no visible link between the emergent properties of the swarm and their local interactions. To this end, we develop a method to automatically extract understandable swarm controllers from video demonstrations. The method uses evolutionary algorithms driven by a fitness function that compares eight high-level swarm metrics. The method is able to extract many controllers (behavior trees) in a simple collective movement task. We then provide a qualitative analysis of behaviors that resulted in different trees, but similar behaviors. This provides the first steps toward automatic extraction of swarm controllers based on observations.


Cooperative Online Learning in Stochastic and Adversarial MDPs

arXiv.org Artificial Intelligence

We study cooperative online learning in stochastic and adversarial Markov decision process (MDP). That is, in each episode, $m$ agents interact with an MDP simultaneously and share information in order to minimize their individual regret. We consider environments with two types of randomness: \emph{fresh} -- where each agent's trajectory is sampled i.i.d, and \emph{non-fresh} -- where the realization is shared by all agents (but each agent's trajectory is also affected by its own actions). More precisely, with non-fresh randomness the realization of every cost and transition is fixed at the start of each episode, and agents that take the same action in the same state at the same time observe the same cost and next state. We thoroughly analyze all relevant settings, highlight the challenges and differences between the models, and prove nearly-matching regret lower and upper bounds. To our knowledge, we are the first to consider cooperative reinforcement learning (RL) with either non-fresh randomness or in adversarial MDPs.


Modelling airborne transmission of SARS-CoV-2 at a local scale

arXiv.org Artificial Intelligence

The coronavirus disease (COVID-19) pandemic has changed our lives and still poses a challenge to science. Numerous studies have contributed to a better understanding of the pandemic. In particular, inhalation of aerosolised pathogens has been identified as essential for transmission. This information is crucial to slow the spread, but the individual likelihood of becoming infected in everyday situations remains uncertain. Mathematical models help estimate such risks. In this study, we propose how to model airborne transmission of SARS-CoV-2 at a local scale. In this regard, we combine microscopic crowd simulation with a new model for disease transmission. Inspired by compartmental models, we describe agents' health status as susceptible, exposed, infectious or recovered. Infectious agents exhale pathogens bound to persistent aerosols, whereas susceptible agents absorb pathogens when moving through an aerosol cloud left by the infectious agent. The transmission depends on the pathogen load of the aerosol cloud, which changes over time. We propose a 'high risk' benchmark scenario to distinguish critical from non-critical situations. Simulating indoor situations show that the new model is suitable to evaluate the risk of exposure qualitatively and, thus, enables scientists or even decision-makers to better assess the spread of COVID-19 and similar diseases.


EvolvingBehavior: Towards Co-Creative Evolution of Behavior Trees for Game NPCs

arXiv.org Artificial Intelligence

To assist game developers in crafting game NPCs, we present EvolvingBehavior, a novel tool for genetic programming to evolve behavior trees in Unreal Engine 4. In an initial evaluation, we compare evolved behavior to hand-crafted trees designed by our researchers, and to randomly-grown trees, in a 3D survival game. We find that EvolvingBehavior is capable of producing behavior approaching the designer's goals in this context. Finally, we discuss implications and future avenues of exploration for co-creative game AI design tools, as well as challenges and difficulties in behavior tree evolution.


On Almost-Sure Intention Deception Planning that Exploits Imperfect Observers

arXiv.org Artificial Intelligence

Intention deception involves computing a strategy which deceives the opponent into a wrong belief about the agent's intention or objective. This paper studies a class of probabilistic planning problems with intention deception and investigates how a defender's limited sensing modality can be exploited by an attacker to achieve its attack objective almost surely (with probability one) while hiding its intention. In particular, we model the attack planning in a stochastic system modeled as a Markov decision process (MDP). The attacker is to reach some target states while avoiding unsafe states in the system and knows that his behavior is monitored by a defender with partial observations. Given partial state observations for the defender, we develop qualitative intention deception planning algorithms that construct attack strategies to play against an action-visible defender and an action-invisible defender, respectively. The synthesized attack strategy not only ensures the attack objective is satisfied almost surely but also deceives the defender into believing that the observed behavior is generated by a normal/legitimate user and thus failing to detect the presence of an attack. We show the proposed algorithms are correct and complete and illustrate the deceptive planning methods with examples.


Leveraging Heterogeneous Capabilities in Multi-Agent Systems for Environmental Conflict Resolution

arXiv.org Artificial Intelligence

In this paper, we introduce a high-level controller synthesis framework that enables teams of heterogeneous agents to assist each other in resolving environmental conflicts that appear at runtime. This conflict resolution method is built upon temporal-logic-based reactive synthesis to guarantee safety and task completion under specific environment assumptions. In heterogeneous multi-agent systems, every agent is expected to complete its own tasks in service of a global team objective. However, at runtime, an agent may encounter un-modeled obstacles (e.g., doors or walls) that prevent it from achieving its own task. To address this problem, we employ the capabilities of other heterogeneous agents to resolve the obstacle. A controller framework is proposed to redirect agents with the capability of resolving the appropriate obstacles to the required target when such a situation is detected. Three case studies involving a bipedal robot Digit and a quadcopter are used to evaluate the controller performance in action. Additionally, we implement the proposed framework on a physical multi-agent robotic system to demonstrate its viability for real world applications.


CASPER: Cognitive Architecture for Social Perception and Engagement in Robots

arXiv.org Artificial Intelligence

Our world is being increasingly pervaded by intelligent robots with varying degrees of autonomy. To seamlessly integrate themselves in our society, these machines should possess the ability to navigate the complexities of our daily routines even in the absence of a human's direct input. In other words, we want these robots to understand the intentions of their partners with the purpose of predicting the best way to help them. In this paper, we present CASPER (Cognitive Architecture for Social Perception and Engagement in Robots): a symbolic cognitive architecture that uses qualitative spatial reasoning to anticipate the pursued goal of another agent and to calculate the best collaborative behavior. This is performed through an ensemble of parallel processes that model a low-level action recognition and a high-level goal understanding, both of which are formally verified. We have tested this architecture in a simulated kitchen environment and the results we have collected show that the robot is able to both recognize an ongoing goal and to properly collaborate towards its achievement. This demonstrates a new use of Qualitative Spatial Relations applied to the problem of intention reading in the domain of human-robot interaction.