Agents
Learning Cooperation and Online Planning Through Simulation and Graph Convolutional Network
Mahmud, Rafid Ameer, Faisal, Fahim, Mahmud, Saaduddin, Khan, Md. Mosaddek
Multi-agent Markov Decision Process (MMDP) has been an effective way of modelling sequential decision making algorithms for multi-agent cooperative environments. A number of algorithms based on centralized and decentralized planning have been developed in this domain. However, dynamically changing environment, coupled with exponential size of the state and joint action space, make it difficult for these algorithms to provide both efficiency and scalability. Recently, Centralized planning algorithm FV-MCTS-MP and decentralized planning algorithm \textit{Alternate maximization with Behavioural Cloning} (ABC) have achieved notable performance in solving MMDPs. However, they are not capable of adapting to dynamically changing environments and accounting for the lack of communication among agents, respectively. Against this background, we introduce a simulation based online planning algorithm, that we call SiCLOP, for multi-agent cooperative environments. Specifically, SiCLOP tailors Monte Carlo Tree Search (MCTS) and uses Coordination Graph (CG) and Graph Neural Network (GCN) to learn cooperation and provides real time solution of a MMDP problem. It also improves scalability through an effective pruning of action space. Additionally, unlike FV-MCTS-MP and ABC, SiCLOP supports transfer learning, which enables learned agents to operate in different environments. We also provide theoretical discussion about the convergence property of our algorithm within the context of multi-agent settings. Finally, our extensive empirical results show that SiCLOP significantly outperforms the state-of-the-art online planning algorithms.
ToM2C: Target-oriented Multi-agent Communication and Cooperation with Theory of Mind
Wang, Yuanfei, Zhong, Fangwei, Xu, Jing, Wang, Yizhou
Being able to predict the mental states of others is a key factor to effective social interaction. It is also crucial for distributed multi-agent systems, where agents are required to communicate and cooperate. In this paper, we introduce such an important social-cognitive skill, i.e. Theory of Mind (ToM), to build socially intelligent agents who are able to communicate and cooperate effectively to accomplish challenging tasks. With ToM, each agent is capable of inferring the mental states and intentions of others according to its (local) observation. Based on the inferred states, the agents decide "when" and with "whom" to share their intentions. With the information observed, inferred, and received, the agents decide their sub-goals and reach a consensus among the team. In the end, the low-level executors independently take primitive actions to accomplish the sub-goals. We demonstrate the idea in two typical target-oriented multi-agent tasks: cooperative navigation and multi-sensor target coverage. The experiments show that the proposed model not only outperforms the state-of-the-art methods on reward and communication efficiency, but also shows good generalization across different scales of the environment.
Simulation of emergence in artificial societies: a practical model-based approach with the EB-DEVS formalism
Foguelman, Daniel, Lanzarotti, Esteban, Ferreyra, Emanuel, Castro, Rodrigo
Modelling and simulation of complex systems is key to exploring and understanding social processes, benefiting from formal mechanisms to derive global-level properties from local-level interactions. In this paper we extend the body of knowledge on formal methods in complex systems by applying EB-DEVS, a novel formalism tailored for the modelling, simulation and live identification of emergent properties. We guide the reader through the implementation of different classical models for varied social systems to introduce good modelling practices and showcase the advantages and limitations of modelling emergence with EB-DEVS, in particular through its live emergence detection capability. This work provides case study-driven evidence for the neatness and compactness of the approach to modelling communication structures that can be explicit or implicit, static or dynamic, with or without multilevel interactions, and with weak or strong emergent behaviour. Throughout examples we show that EB-DEVS permits conceptualising the analysed societies by incorporating emergent behaviour when required, namely by integrating as a macro-level aggregate the Gini index in the Sugarscape model, Fads and Fashion in the Dissemination of Culture model, size-biased degree distribution in a Preferential Attachment model, happiness index in the Segregation model and quarantines in the SIR epidemic model. In each example we discuss the role of communication structures in the development of multilevel simulation models, and illustrate how micro-macro feedback loops enable the modelling of macro-level properties. Our results stress the relevance of multilevel features to support a robust approach in the modelling and simulation of complex systems.
SaLinA: Sequential Learning of Agents
Denoyer, Ludovic, de la Fuente, Alfredo, Duong, Song, Gaya, Jean-Baptiste, Kamienny, Pierre-Alexandre, Thompson, Daniel H.
SaLinA is a simple library that makes implementing complex sequential learning models easy, including reinforcement learning algorithms. It is built as an extension of PyTorch: algorithms coded with \SALINA{} can be understood in few minutes by PyTorch users and modified easily. Moreover, SaLinA naturally works with multiple CPUs and GPUs at train and test time, thus being a good fit for the large-scale training use cases. In comparison to existing RL libraries, SaLinA has a very low adoption cost and capture a large variety of settings (model-based RL, batch RL, hierarchical RL, multi-agent RL, etc.). But SaLinA does not only target RL practitioners, it aims at providing sequential learning capabilities to any deep learning programmer.
TEACh: Task-driven Embodied Agents that Chat
Padmakumar, Aishwarya, Thomason, Jesse, Shrivastava, Ayush, Lange, Patrick, Narayan-Chen, Anjali, Gella, Spandana, Piramuthu, Robinson, Tur, Gokhan, Hakkani-Tur, Dilek
Robots operating in human spaces must be able to engage in natural language interaction with people, both understanding and executing instructions, and using conversation to resolve ambiguity and recover from mistakes. To study this, we introduce TEACh, a dataset of over 3,000 human--human, interactive dialogues to complete household tasks in simulation. A Commander with access to oracle information about a task communicates in natural language with a Follower. The Follower navigates through and interacts with the environment to complete tasks varying in complexity from "Make Coffee" to "Prepare Breakfast", asking questions and getting additional information from the Commander. We propose three benchmarks using TEACh to study embodied intelligence challenges, and we evaluate initial models' abilities in dialogue understanding, language grounding, and task execution.
Safety-aware Policy Optimisation for Autonomous Racing
Chen, Bingqing, Francis, Jonathan, Herman, James, Oh, Jean, Nyberg, Eric, Herbert, Sylvia L.
To be viable for safety-critical applications, such as autonomous driving and assistive robotics, autonomous agents should adhere to safety constraints throughout the interactions with their environments. Instead of learning about safety by collecting samples, including unsafe ones, methods such as Hamilton-Jacobi (HJ) reachability compute safe sets with theoretical guarantees using models of the system dynamics. However, HJ reachability is not scalable to high-dimensional systems, and the guarantees hinge on the quality of the model. In this work, we inject HJ reachability theory into the constrained Markov decision process (CMDP) framework, as a control-theoretical approach for safety analysis via model-free updates on state-action pairs. Furthermore, we demonstrate that the HJ safety value can be learned directly on vision context, the highest-dimensional problem studied via the method to-date. We evaluate our method on several benchmark tasks, including Safety Gym and Learn-to-Race (L2R), a recently-released high-fidelity autonomous racing environment. Our approach has significantly fewer constraint violations in comparison to other constrained RL baselines, and achieve the new state-of-the-art results on the L2R benchmark task.
The Neural MMO Platform for Massively Multiagent Research
Suarez, Joseph, Du, Yilun, Zhu, Clare, Mordatch, Igor, Isola, Phillip
Neural MMO is a computationally accessible research platform that combines large agent populations, long time horizons, open-ended tasks, and modular game systems. Existing environments feature subsets of these properties, but Neural MMO is the first to combine them all. We present Neural MMO as free and open source software with active support, ongoing development, documentation, and additional training, logging, and visualization tools to help users adapt to this new setting. Initial baselines on the platform demonstrate that agents trained in large populations explore more and learn a progression of skills. We raise other more difficult problems such as many-team cooperation as open research questions which Neural MMO is well-suited to answer. Finally, we discuss current limitations of the platform, potential mitigations, and plans for continued development.
HAVEN: Hierarchical Cooperative Multi-Agent Reinforcement Learning with Dual Coordination Mechanism
Xu, Zhiwei, Bai, Yunpeng, Zhang, Bin, Li, Dapeng, Fan, Guoliang
Multi-agent reinforcement learning often suffers from the exponentially larger action space caused by a large number of agents. In this paper, we propose a novel value decomposition framework HAVEN based on hierarchical reinforcement learning for the fully cooperative multi-agent problems. In order to address instabilities that arise from the concurrent optimization of high-level and low-level policies and another concurrent optimization of agents, we introduce the dual coordination mechanism of inter-layer strategies and inter-agent strategies. HAVEN does not require domain knowledge and pretraining at all, and can be applied to any value decomposition variants. Our method is demonstrated to achieve superior results to many baselines on StarCraft II micromanagement tasks and offers an efficient solution to multi-agent hierarchical reinforcement learning in fully cooperative scenarios.
Towards a fully RL-based Market Simulator
Ardon, Leo, Vadori, Nelson, Spooner, Thomas, Xu, Mengda, Vann, Jared, Ganesh, Sumitra
We present a new financial framework where two families of RL-based agents representing the Liquidity Providers and Liquidity Takers learn simultaneously to satisfy their objective. Thanks to a parametrized reward formulation and the use of Deep RL, each group learns a shared policy able to generalize and interpolate over a wide range of behaviors. This is a step towards a fully RL-based market simulator replicating complex market conditions particularly suited to study the dynamics of the financial market under various scenarios.
OPEn: An Open-ended Physics Environment for Learning Without a Task
Gan, Chuang, Bhandwaldar, Abhishek, Torralba, Antonio, Tenenbaum, Joshua B., Isola, Phillip
Humans have mental models that allow them to plan, experiment, and reason in the physical world. How should an intelligent agent go about learning such models? In this paper, we will study if models of the world learned in an open-ended physics environment, without any specific tasks, can be reused for downstream physics reasoning tasks. To this end, we build a benchmark Open-ended Physics ENvironment (OPEn) and also design several tasks to test learning representations in this environment explicitly. This setting reflects the conditions in which real agents (i.e. rolling robots) find themselves, where they may be placed in a new kind of environment and must adapt without any teacher to tell them how this environment works. This setting is challenging because it requires solving an exploration problem in addition to a model building and representation learning problem. We test several existing RL-based exploration methods on this benchmark and find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results. However, all models still fall short in sample efficiency when transferring to the downstream tasks. We expect that OPEn will encourage the development of novel rolling robot agents that can build reusable mental models of the world that facilitate many tasks.