Agents
Hierarchical Reinforcement Learning for Optimal Control of Linear Multi-Agent Systems: the Homogeneous Case
Jing, Gangshan, Bai, He, George, Jemin, Chakrabortty, Aranya
Individual agents in a multi-agent system (MAS) may have decoupled open-loop dynamics, but a cooperative control objective usually results in coupled closed-loop dynamics thereby making the control design computationally expensive. The computation time becomes even higher when a learning strategy such as reinforcement learning (RL) needs to be applied to deal with the situation when the agents dynamics are not known. To resolve this problem, this paper proposes a hierarchical RL scheme for a linear quadratic regulator (LQR) design in a continuous-time linear MAS. The idea is to exploit the structural properties of two graphs embedded in the $Q$ and $R$ weighting matrices in the LQR objective to define an orthogonal transformation that can convert the original LQR design to multiple decoupled smaller-sized LQR designs. We show that if the MAS is homogeneous then this decomposition retains closed-loop optimality. Conditions for decomposability, an algorithm for constructing the transformation matrix, a hierarchical RL algorithm, and robustness analysis when the design is applied to non-homogeneous MAS are presented. Simulations show that the proposed approach can guarantee significant speed-up in learning without any loss in the cumulative value of the LQR cost.
Multi-Agent Collaboration via Reward Attribution Decomposition
Zhang, Tianjun, Xu, Huazhe, Wang, Xiaolong, Wu, Yi, Keutzer, Kurt, Gonzalez, Joseph E., Tian, Yuandong
Recent advances in multi-agent reinforcement learning (MARL) have achieved superhuman performance in games like Quake 3 and Dota 2. Unfortunately, these techniques require orders-of-magnitude more training rounds than humans and may not generalize to slightly altered environments or new agent configurations (i.e., ad hoc team play). In this work, we propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge and supports ad hoc team play. We first formulate multi-agent collaboration as a joint optimization on reward assignment and show that under certain conditions, each agent has a decentralized Q-function that is approximately optimal and can be decomposed into two terms: the self-term that only relies on the agent's own state, and the interactive term that is related to states of nearby agents, often observed by the current agent. The two terms are jointly trained using regular DQN, regulated with a Multi-Agent Reward Attribution (MARA) loss that ensures both terms retain their semantics. CollaQ is evaluated on various StarCraft maps, outperforming existing state-of-the-art techniques (i.e., QMIX, QTRAN, and VDN) by improving the win rate by 40% with the same number of environment steps. In the more challenging ad hoc team play setting (i.e., reweight/add/remove units without retraining or finetuning), CollaQ outperforms previous SoTA by over 30%. In recent years, multi-agent deep reinforcement learning (MARL) has drawn increasing interest from the research community. MARL algorithms have shown superhuman level performance in various games like Dota 2 (Berner et al., 2019), Quake 3 Arena (Jaderberg et al., 2019), and StarCraft (Samvelyan et al., 2019). However, the algorithms (Schulman et al., 2017; Mnih et al., 2013) are far less sample efficient than humans.
SMAC: Symbiotic Multi-Agent Construction
Wagner, Caleb, Dhanaraj, Neel, Rizzo, Trevor, Contreras, Josue, Liang, Hannan, Lewin, Gregory, Pinciroli, Carlo
We present a novel concept of a heterogeneous, distributed platform for autonomous 3D construction. The platform is composed of two types of robots acting in a coordinated and complementary fashion: (i) A collection of communicating smart construction blocks behaving as a form of growable smart matter, and capable of planning and monitoring their own state and the construction progress; and (ii) A team of inchworm-shaped builder robots designed to navigate and modify the 3D structure, following the guidance of the smart blocks. We describe the design of the hardware and introduce algorithms for navigation and construction that support a wide class of 3D structures. We demonstrate the capabilities of our concept and characterize its performance through simulations and real-robot experiments.
Peer-Assisted Robotic Learning: A Data-Driven Collaborative Learning Approach for Cloud Robotic Systems
Liu, Boyi, Wang, Lujia, Chen, Xinquan, Huang, Lexiong, Xu, Cheng-Zhong
A technological revolution is occurring in the field of robotics with the data-driven deep learning technology. However, building datasets for each local robot is laborious. Meanwhile, data islands between local robots make data unable to be utilized collaboratively. To address this issue, the work presents Peer-Assisted Robotic Learning (PARL) in robotics, which is inspired by the peer-assisted learning in cognitive psychology and pedagogy. PARL implements data collaboration with the framework of cloud robotic systems. Both data and models are shared by robots to the cloud after semantic computing and training locally. The cloud converges the data and performs augmentation, integration, and transferring. Finally, fine tune this larger shared dataset in the cloud to local robots. Furthermore, we propose the DAT Network (Data Augmentation and Transferring Network) to implement the data processing in PARL. DAT Network can realize the augmentation of data from multi-local robots. We conduct experiments on a simplified self-driving task for robots (cars). DAT Network has a significant improvement in the augmentation in self-driving scenarios. Along with this, the self-driving experimental results also demonstrate that PARL is capable of improving learning effects with data collaboration of local robots.
Designing Emergency Response Pipelines : Lessons and Challenges
Mukhopadhyay, Ayan, Pettet, Geoffrey, Kochenderfer, Mykel, Dubey, Abhishek
Emergency response to incidents such as accidents, crimes, and fires is a major problem faced by communities. Emergency response management comprises of several stages and sub-problems like forecasting, resource allocation, and dispatch. The design of principled approaches to tackle each problem is necessary to create efficient emergency response management (ERM) pipelines. Over the last six years, we have worked with several first responder organizations to design ERM pipelines. In this paper, we highlight some of the challenges that we have identified and lessons that we have learned through our experience in this domain. Such challenges are particularly relevant for practitioners and researchers, and are important considerations even in the design of response strategies to mitigate disasters like floods and earthquakes.
QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning
Hu, Jian, Harding, Seth Austin, Wu, Haibin, Hu, Siyue, Liao, Shih-wei
In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the setting of Centralized Training with Decentralized Execution (CTDE), agents observe and interact with their environment locally and independently. With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns. Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness. Our proposed model QR-MIX introduces quantile regression, modeling joint state-action values as a distribution, combining QMIX with Implicit Quantile Network (IQN). However, the monotonicity in QMIX limits the expression of joint state-action value distribution and may lead to incorrect estimation results in non-monotonic cases. Therefore, we proposed a flexible loss function to approximate the monotonicity found in QMIX. Our model is not only more tolerant of the randomness of returns, but also more tolerant of the randomness of monotonic constraints. The experimental results demonstrate that QR-MIX outperforms the previous state-of-the-art method QMIX in the StarCraft Multi-Agent Challenge (SMAC) environment.
Artificial Intelligence Books for Beginners
Artificial Intelligence (AI) has taken the world by storm. Almost every industry across the globe is incorporating AI for a variety of applications and use cases. Some of its wide range of applications includes process automation, predictive analysis, fraud detection, improving customer experience, etc. To learn more about AI and it's concepts, you can start by reading the Top Artificial Intelligence Books for self-learning. AI is being foreseen as the future of technological and economic development.
Optimal Assistance for Object-Rearrangement Tasks in Augmented Reality
Newman, Benjamin, Carlberg, Kevin, Desai, Ruta
Augmented-reality (AR) glasses that will have access to onboard sensors and an ability to display relevant information to the user present an opportunity to provide user assistance in quotidian tasks. Many such tasks can be characterized as object-rearrangement tasks. We introduce a novel framework for computing and displaying AR assistance that consists of (1) associating an optimal action sequence with the policy of an embodied agent and (2) presenting this sequence to the user as suggestions in the AR system's heads-up display. The embodied agent comprises a "hybrid" between the AR system and the user, with the AR system's observation space (i.e., sensors) and the user's action space (i.e., task-execution actions); its policy is learned by minimizing the task-completion time. In this initial study, we assume that the AR system's observations include the environment's map and localization of the objects and the user. These choices allow us to formalize the problem of computing AR assistance for any object-rearrangement task as a planning problem, specifically as a capacitated vehicle-routing problem. Further, we introduce a novel AR simulator that can enable web-based evaluation of AR-like assistance and associated at-scale data collection via the Habitat simulator for embodied artificial intelligence. Finally, we perform a study that evaluates user response to the proposed form of AR assistance on a specific quotidian object-rearrangement task, house cleaning, using our proposed AR simulator on mechanical turk. In particular, we study the effect of the proposed AR assistance on users' task performance and sense of agency over a range of task difficulties. Our results indicate that providing users with such assistance improves their overall performance and while users report a negative impact to their agency, they may still prefer the proposed assistance to having no assistance at all.
Collective defense of honeybee colonies: experimental results and theoretical modeling
López-Incera, Andrea, Nouvian, Morgane, Ried, Katja, Müller, Thomas, Briegel, Hans J.
Social insect colonies routinely face large vertebrate predators, against which they need to mount a collective defense. To do so, honeybees use an alarm pheromone that recruits nearby bees into mass stinging of the perceived threat. This alarm pheromone is carried directly on the stinger, hence its concentration builds up during the course of the attack. Here, we investigate how individual bees react to different alarm pheromone concentrations, and how this evolved response-pattern leads to better coordination at the group level. We first present an individual dose-response curve to the alarm pheromone, obtained experimentally. Second, we apply Projective Simulation to model each bee as an artificial learning agent that relies on the pheromone concentration to decide whether to sting or not. If the emergent collective performance benefits the colony, the individual reactions that led to it are enhanced via reinforcement learning, thus emulating natural selection. Predators are modeled in a realistic way so that the effect of factors such as their resistance, their killing rate or their frequency of attacks can be studied. We are able to reproduce the experimentally measured response-pattern of real bees, and to identify the main selection pressures that shaped it. Finally, we apply the model to a case study: by tuning the parameters to represent the environmental conditions of European or African bees, we can predict the difference in aggressiveness observed between these two subspecies.
Affect-Driven Modelling of Robot Personality for Collaborative Human-Robot Interactions
Churamani, Nikhil, Barros, Pablo, Gunes, Hatice, Wermter, Stefan
Collaborative interactions require social robots to adapt to the dynamics of human affective behaviour. Yet, current approaches for affective behaviour generation in robots focus on instantaneous perception to generate a one-to-one mapping between observed human expressions and static robot actions. In this paper, we propose a novel framework for personality-driven behaviour generation in social robots. The framework consists of (i) a hybrid neural model for evaluating facial expressions and speech, forming intrinsic affective representations in the robot, (ii) an Affective Core, that employs self-organising neural models to embed robot personality traits like patience and emotional actuation, and (iii) a Reinforcement Learning model that uses the robot's affective appraisal to learn interaction behaviour. For evaluation, we conduct a user study (n = 31) where the NICO robot acts as a proposer in the Ultimatum Game. The effect of robot personality on its negotiation strategy is witnessed by participants, who rank a patient robot with high emotional actuation higher on persistence, while an inert and impatient robot higher on its generosity and altruistic behaviour.