communication message
Multi-Agent Reinforcement Learning with Communication-Constrained Priors
Yang, Guang, Yang, Tianpei, Qiao, Jingwen, Wu, Yanqing, Huo, Jing, Chen, Xingguo, Gao, Yang
Communication is one of the effective means to improve the learning of cooperative policy in multi-agent systems. However, in most real-world scenarios, lossy communication is a prevalent issue. Existing multi-agent reinforcement learning with communication, due to their limited scalability and robustness, struggles to apply to complex and dynamic real-world environments. To address these challenges, we propose a generalized communication-constrained model to uniformly characterize communication conditions across different scenarios. Based on this, we utilize it as a learning prior to distinguish between lossy and lossless messages for specific scenarios. Additionally, we decouple the impact of lossy and lossless messages on distributed decision-making, drawing on a dual mutual information estimatior, and introduce a communication-constrained multi-agent reinforcement learning framework, quantifying the impact of communication messages into the global reward. Finally, we validate the effectiveness of our approach across several communication-constrained benchmarks.
a03caec56cd82478bf197475b48c05f9-Supplemental.pdf
Algorithm 1 shows the pseudocode of LIAM.Algorithm 1 Pseudocode of LIAM's algorithmfor m = 1,...,M episodes do Reset the hidden state of the encoder LSTM Sample E fixed policies from ฮ Create E parallel environments and gather initial observations a The fixed policies in the predator-prey consist of a combination of heuristic and pretrained policies. First we created four heuristic policies, which are: (i) going after the prey, (ii) going after one of the predators, (iii) going after the agent (predator or prey) that is closest, (iv) going after the predator that is closest. CARL has access to the trajectories of all the other agents in the environment during training, but during execution only to the local trajectory. To extract such representations, we use self-supervised learning based on recent advances on contrastive learning [Oord et al., 2018, He et al., 2020, Chen et al., 2020a,b]. During training and given a batch of episode trajectories we construct the positive and negative pairs following Equation (4) and minimise the InfoNCE loss [Oord et al., 2018] Following the work of Chung et al. [2015] we can write the lower bound in the log-evidence of the We train LIAM-V AE similarly to LIAM.
Learning to Communicate in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence
Contractor, Faizan, Li, Li, Mallah, Ranwa Al
Popular methods in cooperative Multi-Agent Reinforcement Learning with partially observable environments typically allow agents to act independently during execution, which may limit the coordinated effect of the trained policies. However, by sharing information such as known or suspected ongoing threats, effective communication can lead to improved decision-making in the cyber battle space. We propose a game design where defender agents learn to communicate and defend against imminent cyber threats by playing training games in the Cyber Operations Research Gym, using the Differentiable Inter Agent Learning algorithm adapted to the cyber operational environment. The tactical policies learned by these autonomous agents are akin to those of human experts during incident responses to avert cyber threats. In addition, the agents simultaneously learn minimal cost communication messages while learning their defence tactical policies.
Language Grounded Multi-agent Reinforcement Learning with Human-interpretable Communication
Li, Huao, Mahjoub, Hossein Nourkhiz, Chalaki, Behdad, Tadiparthi, Vaishnav, Lee, Kwonjoon, Moradi-Pari, Ehsan, Lewis, Charles Michael, Sycara, Katia P
Multi-Agent Reinforcement Learning (MARL) methods have shown promise in enabling agents to learn a shared communication protocol from scratch and accomplish challenging team tasks. However, the learned language is usually not interpretable to humans or other agents not co-trained together, limiting its applicability in ad-hoc teamwork scenarios. In this work, we propose a novel computational pipeline that aligns the communication space between MARL agents with an embedding space of human natural language by grounding agent communications on synthetic data generated by embodied Large Language Models (LLMs) in interactive teamwork scenarios. Our results demonstrate that introducing language grounding not only maintains task performance but also accelerates the emergence of communication. Furthermore, the learned communication protocols exhibit zero-shot generalization capabilities in ad-hoc teamwork scenarios with unseen teammates and novel task states. This work presents a significant step toward enabling effective communication and collaboration between artificial agents and humans in real-world teamwork settings.
Leveraging Large Language Model for Heterogeneous Ad Hoc Teamwork Collaboration
Liu, Xinzhu, Li, Peiyan, Yang, Wenju, Guo, Di, Liu, Huaping
Abstract--Compared with the widely investigated homogeneous multi-robot collaboration, heterogeneous robots with different capabilities can provide a more efficient and flexible collaboration for more complex tasks. In this paper, we consider a more challenging heterogeneous ad hoc teamwork collaboration problem where an ad hoc robot joins an existing heterogeneous team for a shared goal. Specifically, the ad hoc robot collaborates with unknown teammates without prior coordination, and it is expected to generate an appropriate cooperation policy to improve the efficiency of the whole team. To solve this challenging problem, we leverage the remarkable potential of the large language model (LLM) to establish a decentralized heterogeneous ad hoc teamwork collaboration framework that focuses on generating reasonable policy for an ad hoc robot to collaborate with original heterogeneous teammates. A training-free hierarchical dynamic planner is developed using the LLM together with the newly proposed Interactive Reflection of Thoughts (IRoT) method for the ad hoc agent to adapt to different teams. Then, the new team collaborates and finally finishes the task. Imagine after a natural disaster such as an earthquake or team at any time from any location, and then a heterogeneous hurricane, a team of robots is dispatched for the rescue task. Since the situation of a disaster site is complex, robots of During the past years, the multi-robot collaboration task different capabilities may be required for the rescue. These has been widely investigated, and a bunch of multi-agent robots are likely to be brought from different places and thus embodied tasks are proposed where multiple agents learn arrive at the site at different times. The coming robot doesn't proper strategies to collaborate efficiently [17, 18, 22, 23, 42, have any prior information on existing teammates, and it is 44, 45, 52] and solve complex embodied tasks [27, 35]. All expected to collaborate efficiently and robustly with previously these works only consider homogeneous agents with the same unknown teammates for the same goal. However, in real-world applications, the robots describes a typical heterogeneous ad hoc teamwork, and the may be faced with more complicated situations such as seismic new coming robot is called an ad hoc robot. It is necessary to leverage heterogeneous ad hoc teamwork collaboration is demonstrated robots with different capabilities to accomplish the task better in Figure 1, where heterogeneous robots of different capabilities [14, 19, 36, 37, 41]. Meanwhile, the ad hoc teamwork can compose any team, and the original heterogeneous team collaboration is an important problem in the heterogeneous collaborates to execute a task. An ad hoc robot could join this multi-robot collaboration, which has been rarely addressed. Beijing University of Posts and Telecommunications, Beijing, China.
Intent Profiling and Translation Through Emergent Communication
Mostafa, Salwa, Elbamby, Mohammed S., Abdel-Aziz, Mohamed K., Bennis, Mehdi
To effectively express and satisfy network application requirements, intent-based network management has emerged as a promising solution. In intent-based methods, users and applications express their intent in a high-level abstract language to the network. Although this abstraction simplifies network operation, it induces many challenges to efficiently express applications' intents and map them to different network capabilities. Therefore, in this work, we propose an AI-based framework for intent profiling and translation. We consider a scenario where applications interacting with the network express their needs for network services in their domain language. The machine-to-machine communication (i.e., between applications and the network) is complex since it requires networks to learn how to understand the domain languages of each application, which is neither practical nor scalable. Instead, a framework based on emergent communication is proposed for intent profiling, in which applications express their abstract quality-of-experience (QoE) intents to the network through emergent communication messages. Subsequently, the network learns how to interpret these communication messages and map them to network capabilities (i.e., slices) to guarantee the requested Quality-of-Service (QoS). Simulation results show that the proposed method outperforms self-learning slicing and other baselines, and achieves a performance close to the perfect knowledge baseline.
Emergent Communication in Multi-Agent Reinforcement Learning for Future Wireless Networks
Chafii, Marwa, Naoumi, Salmane, Alami, Reda, Almazrouei, Ebtesam, Bennis, Mehdi, Debbah, Merouane
In different wireless network scenarios, multiple network entities need to cooperate in order to achieve a common task with minimum delay and energy consumption. Future wireless networks mandate exchanging high dimensional data in dynamic and uncertain environments, therefore implementing communication control tasks becomes challenging and highly complex. Multi-agent reinforcement learning with emergent communication (EC-MARL) is a promising solution to address high dimensional continuous control problems with partially observable states in a cooperative fashion where agents build an emergent communication protocol to solve complex tasks. This paper articulates the importance of EC-MARL within the context of future 6G wireless networks, which imbues autonomous decision-making capabilities into network entities to solve complex tasks such as autonomous driving, robot navigation, flying base stations network planning, and smart city applications. An overview of EC-MARL algorithms and their design criteria are provided while presenting use cases and research opportunities on this emerging topic.