Agents
Learning to Dynamically Coordinate Multi-Robot Teams in Graph Attention Networks
Wang, Zheyuan, Gombolay, Matthew
Personal use of this material is permitted. Abstract -- Increasing interest in integrating advanced robotics within manufacturing has spurred a renewed concentration in developing real-time scheduling solutions to coordinate human-robot collaboration in this environment. Traditionally, the problem of scheduling agents to complete tasks with temporal and spatial constraints has been approached either with exact algorithms, which are computationally intractable for large-scale, dynamic coordination, or approximate methods that require domain experts to craft heuristics for each application. We seek to overcome the limitations of these conventional methods by developing a novel graph attention network formulation to automatically learn features of scheduling problems to allow their deployment. T o learn effective policies for combinatorial optimization problems via machine learning, we combine imitation learning on smaller problems with deep Q-learning on larger problems, in a nonparametric framework, to allow for fast, near-optimal scheduling of robot teams. We show that our network-based policy finds at least twice as many solutions over prior state-of-the-art methods in all testing scenarios. I. INTRODUCTION Advances in robotic technology are enabling the introduction of mobile robots into manufacturing environments alongside human workers. By removing the cage around traditional robot platforms and integrating dynamic, final assembly operations with human-robot teams, manufacturers can see improvements in reducing a factory's footprint and environmental costs, as well as increased productivity [1].
Game Description Logic with Integers: A GDL Numerical Extension
Mittelmann, Munyque, Perrussel, Laurent
Many problems can be viewed as games, where one or more agents try to ensure that certain objectives hold no matter t he behavior from the environment and other agents. In recent years, a num ber of logical formalisms have been proposed for specifying games amo ng which the Game Description Language (GDL) was established as the o fficial language for General Game Playing. Although numbers are rec urring in games, the description of games with numerical features in G DL requires the enumeration from all possible numeric values and the rel ation among them. Thereby, in this paper, we introduce the Game Descript ion Logic with Integers (GDLZ) to describe games with numerical varia bles, numerical parameters, as well as to perform numerical compari sons. We compare our approach with GDL and show that when describing t he same game, GDLZ is more compact.
Decentralised Sparse Multi-Task Regression
Richards, Dominic, Negahban, Sahand N., Rebeschini, Patrick
We consider a sparse multi-task regression framework for fitting a collection of related sparse models. Representing models as nodes in a graph with edges between related models, a framework that fuses lasso regressions with the total variation penalty is investigated. Under a form of restricted eigenvalue assumption, bounds on prediction and squared error are given that depend upon the sparsity of each model and the differences between related models. This assumption relates to the smallest eigenvalue restricted to the intersection of two cone sets of the covariance matrix constructed from each of the agents' covariances. We show that this assumption can be satisfied if the constructed covariance matrix satisfies a restricted isometry property. In the case of a grid topology high-probability bounds are given that match, up to log factors, the no-communication setting of fitting a lasso on each model, divided by the number of agents. A decentralised dual method that exploits a convex-concave formulation of the penalised problem is proposed to fit the models and its effectiveness demonstrated on simulations against the group lasso and variants.
Optimal Farsighted Agents Tend to Seek Power
Some researchers have speculated that capable reinforcement learning (RL) agents pursuing misspecified objectives are often incentivized to seek resources and power in pursuit of those objectives. An agent seeking power is incentivized to behave in undesirable ways, including rationally preventing deactivation and correction. Others have voiced skepticism: humans seem idiosyncratic in their urges to power, which need not be present in the agents we design. We formalize a notion of power within the context of finite deterministic Markov decision processes (MDPs). We prove that, with respect to a wide class of reward function distributions, optimal policies tend to seek power over the environment.
A Dataset Schema for Cooperative Learning from Demonstration in Multi-robots Systems
Simões, Marco A. C., da Silva, Robson Marinho, Nogueira, Tatiane
To achieve these common goals, agents in a MAS should be capable of interacting with other agents, not simply by exchanging data, but by engaging as in social activities, such as those people participate in their daily lives: cooperation, coordination, negotiation, and the like. In MASs, agents are assumed to be autonomous - capable of making independent decisions about to do in order to satisfy their design objectives, and thus they need mechanisms that allow them to synchronize and to coordinate their activities at run time [31]. Although one of the main issues in MASs is the agents' coordination structure, this is not hard-wired at design time, as MASs are typically in standard concurrent/distributed systems. One well-known strategy for coordination in MAS is the design of multi-agent coordinated plans [7][35][36][33][14] that include, not only usual agents' actions defined by their effectors, but also communication actions to achieve the necessary synchronization and coordination. To represent communication actions, some specific languages were created, e.g.
A Simulation Model for Pedestrian Crowd Evacuation Based on Various AI Techniques
Muhammed, Danial A., Saeed, Soran A. M., Rashid, Tarik A.
This paper attempts to design an intelligent simulation model for pedestrian crowd evacuation. For this purpose, the cellular automata (CA) was fully integrated with fuzzy logi c, the k th nearest neighbors ( K NN), and some statistical equations. In this model, each pedestrian was assigned a specific speed, according to his/her physical, biological and emotional features. The emergency behavior and evacuation efficiency of each pedestrian were evaluated by coupling his/her speed with various elements, such as environment, pedestrian distribution and familiarity with the exits. These elements all have great impacts on the ev acuation process. Several experiments were carried out to verify the performance of the model in different emergency scenarios. The results show that the proposed model can predict the evacuation time and emergency behavior in various types of building int eriors and pedestrian distributions. The research provides a good reference to the design of building evacuation systems.
BADGER: Learning to (Learn [Learning Algorithms] through Multi-Agent Communication)
Rosa, Marek, Afanasjeva, Olga, Andersson, Simon, Davidson, Joseph, Guttenberg, Nicholas, Hlubuček, Petr, Poliak, Martin, Vítku, Jaroslav, Feyereisl, Jan
An architecture and a learning procedure where: An agent is made up of many experts All experts share the same communication policy (expert policy), but have different internal memory states There are two levels of learning, an inner loop (with a communication stage) and an outer lo op In ner loop - Agent's behavior and adaptation should emerge as a result of e xperts communicating between each other. Expert s send messag es (of any complexity) to each other and update their internal states based on observations/messages and their internal state fr om the previous time-step. Expert policy is fixed and does not c hange during the inner loop Inner loop loss need not even be a proper loss function. It can be any kind of structured feedback guiding the adaptation during th e age nt's lifetime Outer loop - An expert policy is discovered over generations of agents, ensuring that strategies that find solutions to prob lems in divers e environments can quickly emerge in the inner loop Agent's objective is to adapt fast to novel tasks Exhibiting the following novel properties: Roles of experts and connectivity among them assigned dynamically at inference time Learned communication protocol with context dependent messages of varied complexity Generalizes to different numbers and types of inputs/ou tputs Ca n be trained to handle variations in architecture during bot h training and testing Initial empirical results show generalization and scalability along the spectrum of learning types.
SafeLife 1.0: Exploring Side Effects in Complex Environments
Wainwright, Carroll L., Eckersley, Peter
We present SafeLife, a publicly available reinforcement learning environment that tests the safety of reinforcement learning agents. It contains complex, dynamic, tunable, procedurally generated levels with many opportunities for unsafe behavior. Agents are graded both on their ability to maximize their explicit reward and on their ability to operate safely without unnecessary side effects. We train agents to maximize rewards using proximal policy optimization and score them on a suite of benchmark levels. The resulting agents are performant but not safe---they tend to cause large side effects in their environments---but they form a baseline against which future safety research can be measured.
Learning Agent Communication under Limited Bandwidth by Message Pruning
Mao, Hangyu, Zhang, Zhengchao, Xiao, Zhen, Gong, Zhibo, Ni, Yan
Communication is a crucial factor for the big multi-agent world to stay organized and productive. Recently, Deep Reinforcement Learning (DRL) has been applied to learn the communication strategy and the control policy for multiple agents. However, the practical \emph{\textbf{limited bandwidth}} in multi-agent communication has been largely ignored by the existing DRL methods. Specifically, many methods keep sending messages incessantly, which consumes too much bandwidth. As a result, they are inapplicable to multi-agent systems with limited bandwidth. To handle this problem, we propose a gating mechanism to adaptively prune less beneficial messages. We evaluate the gating mechanism on several tasks. Experiments demonstrate that it can prune a lot of messages with little impact on performance. In fact, the performance may be greatly improved by pruning redundant messages. Moreover, the proposed gating mechanism is applicable to several previous methods, equipping them the ability to address bandwidth restricted settings.
Artificial Intelligence for Low-Resource Communities: Influence Maximization in an Uncertain World
The potential of Artificial Intelligence (AI) to tackle challenging problems that afflict society is enormous, particularly in the areas of healthcare, conservation and public safety and security. Many problems in these domains involve harnessing social networks of under-served communities to enable positive change, e.g., using social networks of homeless youth to raise awareness about Human Immunodeficiency Virus (HIV) and other STDs. Unfortunately, most of these real-world problems are characterized by uncertainties about social network structure and influence models, and previous research in AI fails to sufficiently address these uncertainties. This thesis addresses these shortcomings by advancing the state-of-the-art to a new generation of algorithms for interventions in social networks. In particular, this thesis describes the design and development of new influence maximization algorithms which can handle various uncertainties that commonly exist in real-world social networks. These algorithms utilize techniques from sequential planning problems and social network theory to develop new kinds of AI algorithms. Further, this thesis also demonstrates the real-world impact of these algorithms by describing their deployment in three pilot studies to spread awareness about HIV among actual homeless youth in Los Angeles. This represents one of the first-ever deployments of computer science based influence maximization algorithms in this domain. Our results show that our AI algorithms improved upon the state-of-the-art by 160% in the real-world. We discuss research and implementation challenges faced in deploying these algorithms, and lessons that can be gleaned for future deployment of such algorithms. The positive results from these deployments illustrate the enormous potential of AI in addressing societally relevant problems.