Agents
Self-Organization and Artificial Life
Gershenson, Carlos, Trianni, Vito, Werfel, Justin, Sayama, Hiroki
Self-organization can be broadly defined as the ability of a system to display ordered spatio-temporal patterns solely as the result of the interactions among the system components. Processes of this kind characterize both living and artificial systems, making self-organization a concept that is at the basis of several disciplines, from physics to biology to engineering. Placed at the frontiers between disciplines, Artificial Life (ALife) has heavily borrowed concepts and tools from the study of self-organization, providing mechanistic interpretations of life-like phenomena as well as useful constructivist approaches to artificial system design. Despite its broad usage within ALife, the concept of self-organization has been often excessively stretched or misinterpreted, calling for a clarification that could help with tracing the borders between what can and cannot be considered self-organization. In this review, we discuss the fundamental aspects of self-organization and list the main usages within three primary ALife domains, namely "soft" (mathematical/computational modeling), "hard" (physical robots), and "wet" (chemical/biological systems) ALife. Finally, we discuss the usefulness of self-organization within ALife studies, point to perspectives for future research, and list open questions.
Inferring Personalized Bayesian Embeddings for Learning from Heterogeneous Demonstration
Paleja, Rohan, Gombolay, Matthew
For assistive robots and virtual agents to achieve ubiquity, machines will need to anticipate the needs of their human counterparts. The field of Learning from Demonstration (LfD) has sought to enable machines to infer predictive models of human behavior for autonomous robot control. However, humans exhibit heterogeneity in decision-making, which traditional LfD approaches fail to capture. To overcome this challenge, we propose a Bayesian LfD framework to infer an integrated representation of all human task demonstrators by inferring human-specific embeddings, thereby distilling their unique characteristics. We validate our approach is able to outperform state-of-the-art techniques on both synthetic and real-world data sets.
Toward Imitating Visual Attention of Experts in Software Development Tasks
Ikutani, Yoshiharu, Koganti, Nishanth, Hata, Hideaki, Kubo, Takatomi, Matsumoto, Kenichi
Expert programmers' eye-movements during source code reading are valuable sources that are considered to be associated with their domain expertise. We advocate a vision of new intelligent systems incorporating expertise of experts for software development tasks, such as issue localization, comment generation, and code generation. We present a conceptual framework of neural autonomous agents based on imitation learning (IL), which enables agents to mimic the visual attention of an expert via his/her eye movement. In this framework, an autonomous agent is constructed as a context-based attention model that consists of encoder/decoder network and trained with state-action sequences generated by an experts' demonstration. Challenges to implement an IL-based autonomous agent specialized for software development task are discussed in this paper.
Distributed Gibbs: A Linear-Space Sampling-Based DCOP Algorithm
Nguyen, Duc Thien, Yeoh, William, Lau, Hoong Chuin, Zivan, Roie
Researchers have used distributed constraint optimization problems (DCOPs) to model various multi-agent coordination and resource allocation problems. Very recently, Ottens et al. proposed a promising new approach to solve DCOPs that is based on confidence bounds via their Distributed UCT (DUCT) sampling-based algorithm. Unfortunately, its memory requirement per agent is exponential in the number of agents in the problem, which prohibits it from scaling up to large problems. Thus, in this article, we introduce two new sampling-based DCOP algorithms called Sequential Distributed Gibbs (SD-Gibbs) and Parallel Distributed Gibbs (PD-Gibbs). Both algorithms have memory requirements per agent that is linear in the number of agents in the problem. Our empirical results show that our algorithms can find solutions that are better than DUCT, run faster than DUCT, and solve some large problems that DUCT failed to solve due to memory limitations.
Autonomous system uses quadcopters to help wheeled robots climb steep cliffs
Sheer cliff faces present a traversal challenge for most wheeled robots on the market, but researchers at the University of Tokyo say they've developed a two-robot framework that works pretty reliably in their testing. In a newly published paper on the preprint server Arxiv.org "[We] propose a novel cooperative system for an Unmanned Aerial Vehicle (UAV) and an Unmanned Ground Vehicle (UGV) which utilizes the UAV not only as a flying sensor but also as a tether attachment device," the authors of the paper explain. "[It enhances] the poor traversability of the UGV by not only providing a wider range of scanning and mapping from the air, but also by allowing the UGV to climb steep terrains with the winding of the tether." The UGV is permanently attached via mechanized winch and cable to the UAV, a custom-made quadcopter with an Nvidia Jetson TX2 chipset, a flight controller, and a raft of sensors including a modular fisheye camera, time-of-flight sensor, inertial measurement unit (IMU), and laser sensor.
Simulating Emergent Properties of Human Driving Behavior Using Multi-Agent Reward Augmented Imitation Learning
Bhattacharyya, Raunak P., Phillips, Derek J., Liu, Changliu, Gupta, Jayesh K., Driggs-Campbell, Katherine, Kochenderfer, Mykel J.
Recent developments in multi-agent imitation learning have shown promising results for modeling the behavior of human drivers. However, it is challenging to capture emergent traffic behaviors that are observed in real-world datasets. Such behaviors arise due to the many local interactions between agents that are not commonly accounted for in imitation learning. This paper proposes Reward Augmented Imitation Learning (RAIL), which integrates reward augmentation into the multi-agent imitation learning framework and allows the designer to specify prior knowledge in a principled fashion. We prove that convergence guarantees for the imitation learning process are preserved under the application of reward augmentation. This method is validated in a driving scenario, where an entire traffic scene is controlled by driving policies learned using our proposed algorithm. Further, we demonstrate improved performance in comparison to traditional imitation learning algorithms both in terms of the local actions of a single agent and the behavior of emergent properties in complex, multi-agent settings.
VRKitchen: an Interactive 3D Virtual Environment for Task-oriented Learning
Gao, Xiaofeng, Gong, Ran, Shu, Tianmin, Xie, Xu, Wang, Shu, Zhu, Song-Chun
One of the main challenges of advancing task-oriented learning such as visual task planning and reinforcement learning is the lack of realistic and standardized environments for training and testing AI agents. Previously, researchers often relied on ad-hoc lab environments. There have been recent advances in virtual systems built with 3D physics engines and photo-realistic rendering for indoor and outdoor environments, but the embodied agents in those systems can only conduct simple interactions with the world (e.g., walking around, moving objects, etc.). Most of the existing systems also do not allow human participation in their simulated environments. In this work, we design and implement a virtual reality (VR) system, VRKitchen, with integrated functions which i) enable embodied agents powered by modern AI methods (e.g., planning, reinforcement learning, etc.) to perform complex tasks involving a wide range of fine-grained object manipulations in a realistic environment, and ii) allow human teachers to perform demonstrations to train agents (i.e., learning from demonstration). We also provide standardized evaluation benchmarks and data collection tools to facilitate a broad use in research on task-oriented learning and beyond.
Single Deep Counterfactual Regret Minimization
Counterfactual Regret Minimization (CFR) is the most successful algorithm for finding approximate Nash equilibria in imperfect information games. However, CFR's reliance on full game-tree traversals limits its scalability. For this reason, the game's state- and action-space is often abstracted (i.e. simplified) for CFR, and the resulting strategy is then translated back to the full game, which requires extensive expert-knowledge and often converges to highly exploitable policies. A recently proposed method, Deep CFR, applies deep learning directly to CFR, allowing the agent to intrinsically abstract and generalize over the state-space from samples, without requiring expert knowledge. In this paper, we introduce Single Deep CFR (SD-CFR), a simplified variant of Deep CFR that has a lower overall approximation error by avoiding the training of an average strategy network. We show that SD-CFR is more attractive from a theoretical perspective and empirically outperforms Deep CFR with respect to exploitability and one-on-one play in poker.
On the Pitfalls of Measuring Emergent Communication
Lowe, Ryan, Foerster, Jakob, Boureau, Y-Lan, Pineau, Joelle, Dauphin, Yann
How do we know if communication is emerging in a multi-agent system? The vast majority of recent papers on emergent communication show that adding a communication channel leads to an increase in reward or task success. This is a useful indicator, but provides only a coarse measure of the agent's learned communication abilities. As we move towards more complex environments, it becomes imperative to have a set of finer tools that allow qualitative and quantitative insights into the emergence of communication. This may be especially useful to allow humans to monitor agents' behaviour, whether for fault detection, assessing performance, or even building trust. In this paper, we examine a few intuitive existing metrics for measuring communication, and show that they can be misleading. Specifically, by training deep reinforcement learning agents to play simple matrix games augmented with a communication channel, we find a scenario where agents appear to communicate (their messages provide information about their subsequent action), and yet the messages do not impact the environment or other agent in any way. We explain this phenomenon using ablation studies and by visualizing the representations of the learned policies. We also survey some commonly used metrics for measuring emergent communication, and provide recommendations as to when these metrics should be used.
Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces
Fu, Haotian, Tang, Hongyao, Hao, Jianye, Lei, Zihan, Chen, Yingfeng, Fan, Changjie
Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.