Goto

Collaborating Authors

 Agents


Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning

arXiv.org Artificial Intelligence

Inefficient traffic control may cause numerous problems such as traffic congestion and energy waste. This paper proposes a novel multi-agent reinforcement learning method, named KS-DDPG (Knowledge Sharing Deep Deterministic Policy Gradient) to achieve optimal control by enhancing the cooperation between traffic signals. By introducing the knowledge-sharing enabled communication protocol, each agent can access to the collective representation of the traffic environment collected by all agents. The proposed method is evaluated through two experiments respectively using synthetic and real-world datasets. The comparison with state-of-the-art reinforcement learning-based and conventional transportation methods demonstrate the proposed KS-DDPG has significant efficiency in controlling large-scale transportation networks and coping with fluctuations in traffic flow. In addition, the introduced communication mechanism has also been proven to speed up the convergence of the model without significantly increasing the computational burden.


Learning to Communicate with Strangers via Channel Randomisation Methods

arXiv.org Artificial Intelligence

We introduce two methods for improving the performance of agents meeting for the first time to accomplish a communicative task. The methods are: (1) `message mutation' during the generation of the communication protocol; and (2) random permutations of the communication channel. These proposals are tested using a simple two-player game involving a `teacher' who generates a communication protocol and sends a message, and a `student' who interprets the message. After training multiple agents via self-play we analyse the performance of these agents when they are matched with a stranger, i.e. their zero-shot communication performance. We find that both message mutation and channel permutation positively influence performance, and we discuss their effects.


Training Value-Aligned Reinforcement Learning Agents Using a Normative Prior

arXiv.org Artificial Intelligence

As more machine learning agents interact with humans, it is increasingly a prospect that an agent trained to perform a task optimally, using only a measure of task performance as feedback, can violate societal norms for acceptable behavior or cause harm. Value alignment is a property of intelligent agents wherein they solely pursue non-harmful behaviors or human-beneficial goals. We introduce an approach to value-aligned reinforcement learning, in which we train an agent with two reward signals: a standard task performance reward, plus a normative behavior reward. The normative behavior reward is derived from a value-aligned prior model previously shown to classify text as normative or non-normative. We show how variations on a policy shaping technique can balance these two sources of reward and produce policies that are both effective and perceived as being more normative. We test our value-alignment technique on three interactive text-based worlds; each world is designed specifically to challenge agents with a task as well as provide opportunities to deviate from the task to engage in normative and/or altruistic behavior.


Agent-Centric Representations for Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Object-centric representations have recently enabled significant progress in tackling relational reasoning tasks. By building a strong object-centric inductive bias into neural architectures, recent efforts have improved generalization and data efficiency of machine learning algorithms for these problems. One problem class involving relational reasoning that still remains under-explored is multi-agent reinforcement learning (MARL). Here we investigate whether object-centric representations are also beneficial in the fully cooperative MARL setting. Specifically, we study two ways of incorporating an agent-centric inductive bias into our RL algorithm: 1. Introducing an agent-centric attention module with explicit connections across agents 2. Adding an agent-centric unsupervised predictive objective (i.e. not using action labels), to be used as an auxiliary loss for MARL, or as the basis of a pre-training step. We evaluate these approaches on the Google Research Football environment as well as DeepMind Lab 2D. Empirically, agent-centric representation learning leads to the emergence of more complex cooperation strategies between agents as well as enhanced sample efficiency and generalization.


Constraints Satisfiability Driven Reinforcement Learning for Autonomous Cyber Defense

arXiv.org Artificial Intelligence

With the increasing system complexity and attack sophistication, the necessity of autonomous cyber defense becomes vivid for cyber and cyber-physical systems (CPSs). Many existing frameworks in the current state-of-the-art either rely on static models with unrealistic assumptions, or fail to satisfy the system safety and security requirements. In this paper, we present a new hybrid autonomous agent architecture that aims to optimize and verify defense policies of reinforcement learning (RL) by incorporating constraints verification (using satisfiability modulo theory (SMT)) into the agent's decision loop. The incorporation of SMT does not only ensure the satisfiability of safety and security requirements, but also provides constant feedback to steer the RL decision-making toward safe and effective actions. This approach is critically needed for CPSs that exhibit high risk due to safety or security violations. Our evaluation of the presented approach in a simulated CPS environment shows that the agent learns the optimal policy fast and defeats diversified attack strategies in 99\% cases.


Revisiting the Complexity Analysis of Conflict-Based Search: New Computational Techniques and Improved Bounds

arXiv.org Artificial Intelligence

The problem of Multi-Agent Path Finding (MAPF) calls for finding a set of conflict-free paths for a fleet of agents operating in a given environment. Arguably, the state-of-the-art approach to computing optimal solutions is Conflict-Based Search (CBS). In this work we revisit the complexity analysis of CBS to provide tighter bounds on the algorithm's run-time in the worst-case. Our analysis paves the way to better pinpoint the parameters that govern (in the worst case) the algorithm's computational complexity. Our analysis is based on two complementary approaches: In the first approach we bound the run-time using the size of a Multi-valued Decision Diagram (MDD) -- a layered graph which compactly contains all possible single-agent paths between two given vertices for a specific path length. In the second approach we express the running time by a novel recurrence relation which bounds the algorithm's complexity. We use generating functions-based analysis in order to tightly bound the recurrence. Using these technique we provide several new upper-bounds on CBS's complexity. The results allow us to improve the existing bound on the running time of CBS for many cases. For example, on a set of common benchmarks we improve the upper-bound by a factor of at least $2^{10^{7}}$.


Non-monotonic Value Function Factorization for Deep Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

In this paper, we propose actor-critic approaches by introducing an actor policy on QMIX [9], which can remove the monotonicity constraint of QMIX and implement a non-monotonic value function factorization for joint action-value. We evaluate our actor-critic methods on StarCraft II micromanagement tasks, and show that it has a stronger performance on maps with heterogeneous agent types.


Amanda Prorok's talk โ€“ Learning to Communicate in Multi-Agent Systems (with video)

Robohub

In this technical talk, Amanda Prorok, Assistant Professor in the Department of Computer Science and Technology at Cambridge University, and a Fellow of Pembroke College, discusses her team's latest research on what, how and when information needs to be shared among agents that aim to solve cooperative tasks. Effective communication is key to successful multi-agent coordination. Yet it is far from obvious what, how and when information needs to be shared among agents that aim to solve cooperative tasks. In this talk, I discuss our recent work on using Graph Neural Networks (GNNs) to solve multi-agent coordination problems. In my first case-study, I show how we use GNNs to find a decentralized solution to the multi-agent path finding problem, which is known to be NP-hard.


A Robust Model for Trust Evaluation during Interactions between Agents in a Sociable Environment

arXiv.org Artificial Intelligence

Trust evaluation is an important topic in both research and applications in sociable environments. This paper presents a model for trust evaluation between agents by the combination of direct trust, indirect trust through neighbouring links and the reputation of an agent in the environment (i.e. social network) to provide the robust evaluation. Our approach is typology independent from social network structures and in a decentralized manner without a central controller, so it can be used in broad domains.


Planning with Expectation Models for Control

arXiv.org Artificial Intelligence

In model-based reinforcement learning (MBRL), Wan et al. (2019) showed conditions under which the environment model could produce the expectation of the next feature vector rather than the full distribution, or a sample thereof, with no loss in planning performance. Such expectation models are of interest when the environment is stochastic and non-stationary, and the model is approximate, such as when it is learned using function approximation. In these cases a full distribution model may be impractical and a sample model may be either more expensive computationally or of high variance. Wan et al. considered only planning for prediction to evaluate a fixed policy. In this paper, we treat the control case - planning to improve and find a good approximate policy. We prove that planning with an expectation model must update a state-value function, not an action-value function as previously suggested (e.g., Sorg & Singh, 2010). This opens the question of how planning influences action selections. We consider three strategies for this and present general MBRL algorithms for each. We identify the strengths and weaknesses of these algorithms in computational experiments. Our algorithms and experiments are the first to treat MBRL with expectation models in a general setting.