Agents
Emergence of Communication in an Interactive World with Consistent Speakers
Bogin, Ben, Geva, Mor, Berant, Jonathan
Training agents to communicate with one another given task-based supervision only has attracted considerable attention recently, due to the growing interest in developing models for human-agent interaction. Prior work on the topic focused on simple environments, where training using policy gradient was feasible despite the non-stationarity of the agents during training. In this paper, we present a more challenging environment for testing the emergence of communication from raw pixels, where training using policy gradient fails. We propose a new model and training algorithm, that utilizes the structure of a learned representation space to produce more consistent speakers at the initial phases of training, which stabilizes learning. We empirically show that our algorithm substantially improves performance compared to policy gradient. We also propose a new alignment-based metric for measuring context-independence in emerged communication and find our method increases context-independence compared to policy gradient and other competitive baselines.
Gibson Env: Real-World Perception for Embodied Agents
Xia, Fei, Zamir, Amir, He, Zhi-Yang, Sax, Alexander, Malik, Jitendra, Savarese, Silvio
Developing visual perception models for active agents and sensorimotor control are cumbersome to be done in the physical world, as existing algorithms are too slow to efficiently learn in real-time and robots are fragile and costly. This has given rise to learning-in-simulation which consequently casts a question on whether the results transfer to real-world. In this paper, we are concerned with the problem of developing real-world perception for active agents, propose Gibson Virtual Environment for this purpose, and showcase sample perceptual tasks learned therein. Gibson is based on virtualizing real spaces, rather than using artificially designed ones, and currently includes over 1400 floor spaces from 572 full buildings. The main characteristics of Gibson are: I. being from the real-world and reflecting its semantic complexity, II. having an internal synthesis mechanism, "Goggles", enabling deploying the trained models in real-world without needing further domain adaptation, III. embodiment of agents and making them subject to constraints of physics and space.
The Price of Diversity in Assignment Problems
Benabbou, Nawal, Chakraborty, Mithun, Xuan, Vinh Ho, Sliwinski, Jakub, Zick, Yair
We introduce and analyze an extension to the matching problem on a weighted bipartite graph: Assignment with Type Constraints. The two parts of the graph are partitioned into subsets called types and blocks; we seek a matching with the largest sum of weights under the constraint that there is a pre-specified cap on the number of vertices matched in every type-block pair. Our primary motivation stems from the public housing program of Singapore, accounting for over 70\% of its residential real estate. To promote ethnic diversity within its housing projects, Singapore imposes ethnicity quotas: each new housing development comprises blocks of flats and each ethnicity-based group in the population must not own more than a certain percentage of flats in a block. Other domains using similar hard capacity constraints include matching prospective students to schools or medical residents to hospitals. Limiting agents' choices for ensuring diversity in this manner naturally entails some welfare loss. One of our goals is to study the trade-off between diversity and social welfare in such settings. We first show that, while the classic assignment program is polynomial-time computable, adding diversity constraints makes it computationally intractable; however, we identify a $\tfrac{1}{2}$-approximation algorithm, as well as reasonable assumptions on the weights that permit poly-time algorithms. Next, we provide two upper bounds on the {\em price of diversity} -- a measure of the loss in welfare incurred by imposing diversity constraints -- as functions of natural problem parameters. We conclude the paper with simulations based on publicly available data from two diversity-constrained allocation problems -- Singapore Public Housing and Chicago School Choice -- which shed light on how the constrained maximization as well as lottery-based variants perform in practice.
Decentralized dynamic task allocation for UAVs with limited communication range
Pujol-Gonzalez, Marc, Cerquides, Jesus, Meseguer, Pedro, Rodriguez-Aguilar, Juan A., Tambe, Milind
We present the Limited-range Online Routing Problem (LORP), which involves a team of Unmanned Aerial Vehicles (UAVs) with limited communication range that must autonomously coordinate to service task requests. We first show a general approach to cast this dynamic problem as a sequence of decentralized task allocation problems. Then we present two solutions both based on modeling the allocation task as a Markov Random Field to subsequently assess decisions by means of the decentralized Max-Sum algorithm. Our first solution assumes independence between requests, whereas our second solution also considers the UAVs' workloads. A thorough empirical evaluation shows that our workloadbased solution consistently outperforms current state-of-the-art methods in a wide range of scenarios, lowering the average service time up to 16%. In the bestcase scenario there is no gap between our decentralized solution and centralized techniques. In the worst-case scenario we manage to reduce by 25% the gap between current decentralized and centralized techniques. Thus, our solution becomes the method of choice for our problem. Keywords: task allocation, unmanned aerial vehicles, max-sum, decentralized 1. Introduction Unmanned Aerial Vehicles (UAVs) are an attractive technology for largearea surveillance [1]. Today, there are readily available UAVs that are reasonably cheap, have many sensing abilities, exhibit a long endurance and can communicate using radios. UAVs have traditionally been controlled either remotely or by following externally-designed flight plans. Requiring human operators for each UAV implies a large, specialized and expensive human workforce. Likewise, letting UAVs follow externally prepared plans introduces a single point of failure (the planner) and requires UAVs with expensive (satellite) radios to maintain continuous communication with a central station. These constraints are acceptable in some application domains, other applications require more flexible techniques. For instance, consider a force of park rangers tasked with the surveillance of a large natural park. Upon reception of an emergency notification, the rangers must assess the situation as quickly as possible.
APES: a Python toolbox for simulating reinforcement learning environments
Labash, Aqeel, Tampuu, Ardi, Matiisen, Tambet, Aru, Jaan, Vicente, Raul
Assisted by neural networks, reinforcement learning agents have been able to solve increasingly complex tasks over the last years. The simulation environment in which the agents interact is an essential component in any reinforcement learning problem. The environment simulates the dynamics of the agents' world and hence provides feedback to their actions in terms of state observations and external rewards. To ease the design and simulation of such environments this work introduces APES, a highly customizable and open source package in Python to create 2D grid-world environments for reinforcement learning problems. APES equips agents with algorithms to simulate any field of vision, it allows the creation and positioning of items and rewards according to user-defined rules, and supports the interaction of multiple agents.
Using a Game Engine to Simulate Critical Incidents and Data Collection by Autonomous Drones
Smyth, David L., Glavin, Frank G., Madden, Michael G.
Abstract--Using a game engine, we have developed a virtual environment which models important aspects of critical incident scenarios. We focused on modelling phenomena relating to the identification and gathering of key forensic evidence, in order to develop and test a system which can handle chemical, biological, radiological/nuclear or explosive (CBRNe) events autonomously. This allows us to build and validate AIbased technologies, which can be trained and tested in our custom virtual environment before being deployed in real-world scenarios. We have used our virtual scenario to rapidly prototype a system which can use simulated Remote Aerial Vehicles (RAVs) to gather images from the environment for the purpose of mapping. Our environment provides us with an effective medium through which we can develop and test various AI methodologies for critical incident scene assessment, in a safe and controlled manner.
Theoretical Foundations of the A2RD Project: Part I
Braga, Juliao, Silva, Joao Nuno, Endo, Patricia Takako, Omar, Nizam
In [24], the proposal for an inter-agent communication language (ACL) that gave rise to Java Agent Development Framework (JADE), whose best-known original document is [25] followed by a complementary article [26] and a much more complete text in [27]. The importance of the environment, in which the agents interact, is characterized in a very lucid manner in [28]. All active FIPA specifications are listed in Table I.
The Disparate Effects of Strategic Manipulation
Hu, Lily, Immorlica, Nicole, Vaughan, Jennifer Wortman
When consequential decisions are informed by algorithmic input, individuals may feel compelled to alter their behavior in order to gain a system's approval. Previous models of agent responsiveness, termed "strategic manipulation," have analyzed the interaction between a learner and agents in a world where all agents are equally able to manipulate their features in an attempt to "trick" a published classifier. In cases of real world classification, however, an agent's ability to adapt to an algorithm, is not simply a function of her personal interest in receiving a positive classification, but is bound up in a complex web of social factors that affect her ability to pursue certain action responses. In this paper, we adapt models of strategic manipulation to better capture dynamics that may arise in a setting of social inequality wherein candidate groups face different costs to manipulation. We find that whenever one group's costs are higher than the other's, the learner's equilibrium strategy exhibits an inequality-reinforcing phenomenon wherein the learner erroneously admits some members of the advantaged group, while erroneously excluding some members of the disadvantaged group. We also consider the effects of potential interventions in which a learner can subsidize members of the disadvantaged group, lowering their costs in order to improve her own classification performance. Here we encounter a paradoxical result: there exist cases in which providing a subsidy improves only the learner's utility while actually making both candidate groups worse-off--even the group receiving the subsidy. Our results reveal the potentially adverse social ramifications of deploying tools that attempt to evaluate an individual's "quality" when agents' capacities to adaptively respond differ.
MARL-FWC: Optimal Coordination of Freeway Traffic Control Measures
Fares, Ahmed, Gomaa, Walid, Khamis, Mohamed A.
The objective of this article is to optimize the overall traffic flow on freeways using multiple ramp metering controls plus its complementary Dynamic Speed Limits (DSLs). An optimal freeway operation can be reached when minimizing the difference between the freeway density and the critical ratio for maximum traffic flow. In this article, a Multi-Agent Reinforcement Learning for Freeways Control (MARL-FWC) system for ramps metering and DSLs is proposed. MARL-FWC introduces a new microscopic framework at the network level based on collaborative Markov Decision Process modeling (Markov game) and an associated cooperative Q-learning algorithm. The technique incorporates payoff propagation (Max-Plus algorithm) under the coordination graphs framework, particularly suited for optimal control purposes. MARL-FWC provides three control designs: fully independent, fully distributed, and centralized; suited for different network architectures. MARL-FWC was extensively tested in order to assess the proposed model of the joint payoff, as well as the global payoff. Experiments are conducted with heavy traffic flow under the renowned VISSIM traffic simulator to evaluate MARL-FWC. The experimental results show a significant decrease in the total travel time and an increase in the average speed (when compared with the base case) while maintaining an optimal traffic flow.
Behavior Trees as a Representation for Medical Procedures
Hannaford, Blake, Bly, Randall, Humphreys, Ian, Whipple, Mark
Objective: Effective collaboration between machines and clinicians requires flexible data structures to represent medical processes and clinical practice guidelines. Such a data structure could enable effective turn-taking between human and automated components of a complex treatment, accurate on-line monitoring of clinical treatments (for example to detect medical errors), or automated treatment systems (such as future medical robots) whose overall treatment plan is understandable and auditable by human experts. Materials and Methods: Behavior trees (BTs) emerged from video game development as a graphical language for modeling intelligent agent behavior. BTs have several properties which are attractive for modeling medical procedures including human-readability, authoring tools, and composability. Results: This paper will illustrate construction of BTs for exemplary medical procedures and clinical protocols. Discussion and Conclusion: Behavior Trees thus form a useful, and human authorable/readable bridge between clinical practice guidelines and AI systems.