Goto

Collaborating Authors

 Agent Societies


Harfang3D Dog-Fight Sandbox: A Reinforcement Learning Research Platform for the Customized Control Tasks of Fighter Aircrafts

arXiv.org Artificial Intelligence

The advent of deep learning (DL) gave rise to significant breakthroughs in Reinforcement Learning (RL) research. Deep Reinforcement Learning (DRL) algorithms have reached super-human level skills when applied to vision-based control problems as such in Atari 2600 games where environment states were extracted from pixel information. Unfortunately, these environments are far from being applicable to highly dynamic and complex real-world tasks as in autonomous control of a fighter aircraft since these environments only involve 2D representation of a visual world. Here, we present a semi-realistic flight simulation environment Harfang3D Dog-Fight Sandbox for fighter aircrafts. It is aimed to be a flexible toolbox for the investigation of main challenges in aviation studies using Reinforcement Learning. The program provides easy access to flight dynamics model, environment states, and aerodynamics of the plane enabling user to customize any specific task in order to build intelligent decision making (control) systems via RL. The software also allows deployment of bot aircrafts and development of multi-agent tasks. This way, multiple groups of aircrafts can be configured to be competitive or cooperative agents to perform complicated tasks including Dog Fight. During the experiments, we carried out training for two different scenarios: navigating to a designated location and within visual range (WVR) combat, shortly Dog Fight. Using Deep Reinforcement Learning techniques for both scenarios, we were able to train competent agents that exhibit human-like behaviours. Based on this results, it is confirmed that Harfang3D Dog-Fight Sandbox can be utilized as a 3D realistic RL research platform.


Near-Optimal Multi-Agent Learning for Safe Coverage Control

arXiv.org Artificial Intelligence

In multi-agent coverage control problems, agents navigate their environment to reach locations that maximize the coverage of some density. In practice, the density is rarely known $\textit{a priori}$, further complicating the original NP-hard problem. Moreover, in many applications, agents cannot visit arbitrary locations due to $\textit{a priori}$ unknown safety constraints. In this paper, we aim to efficiently learn the density to approximately solve the coverage problem while preserving the agents' safety. We first propose a conditionally linear submodular coverage function that facilitates theoretical analysis. Utilizing this structure, we develop MacOpt, a novel algorithm that efficiently trades off the exploration-exploitation dilemma due to partial observability, and show that it achieves sublinear regret. Next, we extend results on single-agent safe exploration to our multi-agent setting and propose SafeMac for safe coverage and exploration. We analyze SafeMac and give first of its kind results: near optimal coverage in finite time while provably guaranteeing safety. We extensively evaluate our algorithms on synthetic and real problems, including a bio-diversity monitoring task under safety constraints, where SafeMac outperforms competing methods.


Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under Partial Observability

arXiv.org Artificial Intelligence

The state-of-the-art multi-agent reinforcement learning (MARL) methods have provided promising solutions to a variety of complex problems. Yet, these methods all assume that agents perform synchronized primitive-action executions so that they are not genuinely scalable to long-horizon real-world multi-agent/robot tasks that inherently require agents/robots to asynchronously reason about high-level action selection at varying time durations. The Macro-Action Decentralized Partially Observable Markov Decision Process (MacDec-POMDP) is a general formalization for asynchronous decision-making under uncertainty in fully cooperative multi-agent tasks. In this thesis, we first propose a group of value-based RL approaches for MacDec-POMDPs, where agents are allowed to perform asynchronous learning and decision-making with macro-action-value functions in three paradigms: decentralized learning and control, centralized learning and control, and centralized training for decentralized execution (CTDE). Building on the above work, we formulate a set of macro-action-based policy gradient algorithms under the three training paradigms, where agents are allowed to directly optimize their parameterized policies in an asynchronous manner. We evaluate our methods both in simulation and on real robots over a variety of realistic domains. Empirical results demonstrate the superiority of our approaches in large multi-agent problems and validate the effectiveness of our algorithms for learning high-quality and asynchronous solutions with macro-actions.


Incentivising cooperation by rewarding the weakest member

arXiv.org Artificial Intelligence

Autonomous agents that act with each other on behalf of humans are becoming more common in many social domains, such as customer service, transportation, and health care. In such social situations greedy strategies can reduce the positive outcome for all agents, such as leading to stop-and-go traffic on highways, or causing a denial of service on a communications channel. Instead, we desire autonomous decision-making for efficient performance while also considering equitability of the group to avoid these pitfalls. Unfortunately, in complex situations it is far easier to design machine learning objectives for selfish strategies than for equitable behaviors. Here we present a simple way to reward groups of agents in both evolution and reinforcement learning domains by the performance of their weakest member. We show how this yields ``fairer'' more equitable behavior, while also maximizing individual outcomes, and we show the relationship to biological selection mechanisms of group-level selection and inclusive fitness theory.


Creating Emergent Behaviors with Reinforcement Learning and Unreal Engine

#artificialintelligence

In the following article I discuss how to generate emergent behavior in AI characters using Unreal Engine, Reinforcement Learning, and the free machine learning plugin MindMaker. The aim is that the interested reader can use this as a guide for creating emergent behavior in their own game project or embodied AI character. Emergent behavior refers to behaviors that are not pre-programmed but develop organically in response to some environmental stimuli. Emergent behavior is common to many if not all forms of life, being a function of evolution itself. It is also more recently a feature of embodied artificial agents. When one employs emergent behavior methods, one does not rigidly program specific actions for the AI, but instead allows them to "evolve" through some adaptive algorithm such as genetic programming, reinforcement learning, or Monte Carlo methods.


Multi-Agent Chance-Constrained Stochastic Shortest Path with Application to Risk-Aware Intelligent Intersection

arXiv.org Artificial Intelligence

In transportation networks, where traffic lights have traditionally been used for vehicle coordination, intersections act as natural bottlenecks. A formidable challenge for existing automated intersections lies in detecting and reasoning about uncertainty from the operating environment and human-driven vehicles. In this paper, we propose a risk-aware intelligent intersection system for autonomous vehicles (AVs) as well as human-driven vehicles (HVs). We cast the problem as a novel class of Multi-agent Chance-Constrained Stochastic Shortest Path (MCC-SSP) problems and devise an exact Integer Linear Programming (ILP) formulation that is scalable in the number of agents' interaction points (e.g., potential collision points at the intersection). In particular, when the number of agents within an interaction point is small, which is often the case in intersections, the ILP has a polynomial number of variables and constraints. To further improve the running time performance, we show that the collision risk computation can be performed offline. Additionally, a trajectory optimization workflow is provided to generate risk-aware trajectories for any given intersection. The proposed framework is implemented in CARLA simulator and evaluated under a fully autonomous intersection with AVs only as well as in a hybrid setup with a signalized intersection for HVs and an intelligent scheme for AVs. As verified via simulations, the featured approach improves intersection's efficiency by up to $200\%$ while also conforming to the specified tunable risk threshold.


Automated Performance Estimation for Decentralized Optimization via Network Size Independent Problems

arXiv.org Artificial Intelligence

We develop a novel formulation of the Performance Estimation Problem (PEP) for decentralized optimization whose size is independent of the number of agents in the network. The PEP approach allows computing automatically the worst-case performance and worst-case instance of first-order optimization methods by solving an SDP. Unlike previous work, the size of our new PEP formulation is independent of the network size. For this purpose, we take a global view of the decentralized problem and we also decouple the consensus subspace and its orthogonal complement. We apply our methodology to different decentralized methods such as DGD, DIGing and EXTRA and obtain numerically tight performance guarantees that are valid for any network size.


A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings

arXiv.org Artificial Intelligence

Autonomously operating learning agents are becoming more common and this trend is likely to continue accelerating for a variety of reasons. First, cheap sensors, actuators, and high-speed wireless internet have drastically lowered the barrier to deploy an autonomous system. Second, autonomy creates the possibility of learning "on device", keeping experience local and off of any central servers. This makes it easier to comply with privacy requirements (Kairouz et al., 2019) and increases robustness by removing a single point of failure. Third, the autonomous approach is a potentially better fit for never-ending life-long learning (Platanios et al., 2019) since it does not require periodic syncing with updated centralized models. Indeed fully autonomous agents do not require any train-test separation at all, a property thought to be important for establishing open-ended autocurricula (Leibo et al., 2019; Stanley, 2019). However, the presence of multiple interacting autonomous systems raises a host of new challenges. Autonomously operating learning agents must be robust to the presence of other learning agents in their environment (e.g.


Constrained Multi-Agent Path Finding on Directed Graphs

arXiv.org Artificial Intelligence

We discuss C-MP and C-MAPF, generalizations of the classical Motion Planning (MP) and Multi-Agent Path Finding (MAPF) problems on a directed graph G. Namely, we enforce an upper bound on the number of agents that occupy each member of a family of vertex subsets. For instance, this constraint allows maintaining a safety distance between agents. We prove that finding a feasible solution of C-MP and C-MAPF is NP-hard, and we propose a reduction method to convert them to standard MP and MAPF. This reduction method consists in finding a subset of nodes W and a reduced graph G/W, such that a solution of MAPF on G/W provides a solution of C-MAPF on G. Moreover, we study the problem of finding W of maximum cardinality, which is strongly NP-hard.


Hierarchical Cyclic Pursuit: Algebraic Curves Containing the Laplacian Spectra

arXiv.org Artificial Intelligence

The paper addresses the problem of multi-agent communication in networks with regular directed ring structure. These can be viewed as hierarchical extensions of the classical cyclic pursuit topology. We show that the spectra of the corresponding Laplacian matrices allow exact localization on the complex plane. Furthermore, we derive a general form of the characteristic polynomial of such matrices, analyze the algebraic curves its roots belong to, and propose a way to obtain their closed-form equations. In combination with frequency domain consensus criteria for high-order SISO linear agents, these curves enable one to analyze the feasibility of consensus in networks with varying number of agents.