Agents
Evaluating Inter-Operator Cooperation Scenarios to Save Radio Access Network Energy
Marjou, Xavier, Gléau, Tangui Le, Messié, Vincent, Radier, Benoit, Lemlouma, Tayeb, Fromentoux, Gaël
Reducing energy consumption is crucial to reduce the human debt's with regard to our planet. Therefore most companies try to reduce their energetic consumption while taking care to preserve the service delivered to their customers. To do so, a service provider (SP) typically downscale or shutdown part of its infrastructure in periods of low-activity where only few customers need the service. However an SP still needs to maintain part of its infrastructure "on", which still requires significant energy. For example a mobile national operator (MNO) needs to maintain most of its radio access network (RAN) active. Could an SP do better by cooperating with other SPs who would temporarily support its users, thus allowing it to temporarily shut down its infrastructure, and then reciprocate during another low-activity period? To answer this question, we investigated a novel collaboration framework based on multi-agent reinforcement learning (MARL) allowing negotiations between SPs as well as trustful reports from a distributed ledger technology (DLT) to evaluate the amount of energy being saved. We leveraged it to experiment three different sets of rules (free, recommended, or imposed) regulating the negotiation between multiple SPs (3, 4, 8, or 10). With respect to four cooperation metrics (efficiency, safety, incentive-compatibility, and fairness), the simulations showed that the imposed set of rules proved to be the best mode.
Asymptotic Tracking Control of Uncertain MIMO Nonlinear Systems with Less Conservative Controllability Conditions
Zhou, Bing, Huang, Xiucai, Song, Yongduan
For uncertain multiple inputs multi-outputs (MIMO) nonlinear systems, it is nontrivial to achieve asymptotic tracking, and most existing methods normally demand certain controllability conditions that are rather restrictive or even impractical if unexpected actuator faults are involved. In this note, we present a method capable of achieving zero-error steady-state tracking with less conservative (more practical) controllability condition. By incorporating a novel Nussbaum gain technique and some positive integrable function into the control design, we develop a robust adaptive asymptotic tracking control scheme for the system with time-varying control gain being unknown its magnitude and direction. By resorting to the existence of some feasible auxiliary matrix, the current state-of-art controllability condition is further relaxed, which enlarges the class of systems that can be considered in the proposed control scheme. All the closed-loop signals are ensured to be globally ultimately uniformly bounded. Moreover, such control methodology is further extended to the case involving intermittent actuator faults, with application to robotic systems. Finally, simulation studies are carried out to demonstrate the effectiveness and flexibility of this method.
Deep Reinforcement Learning for Multi-Agent Interaction
Ahmed, Ibrahim H., Brewitt, Cillian, Carlucho, Ignacio, Christianos, Filippos, Dunion, Mhairi, Fosong, Elliot, Garcin, Samuel, Guo, Shangmin, Gyevnar, Balint, McInroe, Trevor, Papoudakis, Georgios, Rahman, Arrasy, Schäfer, Lukas, Tamborski, Massimiliano, Vecchio, Giuseppe, Wang, Cheng, Albrecht, Stefano V.
The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.
Decentralized Learning With Limited Communications for Multi-robot Coverage of Unknown Spatial Fields
Nakamura, Kensuke, Santos, María, Leonard, Naomi Ehrich
This paper presents an algorithm for a team of mobile robots to simultaneously learn a spatial field over a domain and spatially distribute themselves to optimally cover it. Drawing from previous approaches that estimate the spatial field through a centralized Gaussian process, this work leverages the spatial structure of the coverage problem and presents a decentralized strategy where samples are aggregated locally by establishing communications through the boundaries of a Voronoi partition. We present an algorithm whereby each robot runs a local Gaussian process calculated from its own measurements and those provided by its Voronoi neighbors, which are incorporated into the individual robot's Gaussian process only if they provide sufficiently novel information. The performance of the algorithm is evaluated in simulation and compared with centralized approaches.
Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL
Kuba, Jakub Grudzien, Feng, Xidong, Ding, Shiyao, Dong, Hao, Wang, Jun, Yang, Yaodong
The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in the artificial intelligence (AI) research community. However, many research endeavours have been focused on developing practical MARL algorithms whose effectiveness has been studied only empirically, thereby lacking theoretical guarantees. As recent studies have revealed, MARL methods often achieve performance that is unstable in terms of reward monotonicity or suboptimal at convergence. To resolve these issues, in this paper, we introduce a novel framework named Heterogeneous-Agent Mirror Learning (HAML) that provides a general template for MARL algorithmic designs. We prove that algorithms derived from the HAML template satisfy the desired properties of the monotonic improvement of the joint reward and the convergence to Nash equilibrium. We verify the practicality of HAML by proving that the current state-of-the-art cooperative MARL algorithms, HATRPO and HAPPO, are in fact HAML instances. Next, as a natural outcome of our theory, we propose HAML extensions of two well-known RL algorithms, HAA2C (for A2C) and HADDPG (for DDPG), and demonstrate their effectiveness against strong baselines on StarCraftII and Multi-Agent MuJoCo tasks.
An Introduction to Multi-Agent Reinforcement Learning and Review of its Application to Autonomous Mobility
Schmidt, Lukas M., Brosig, Johanna, Plinge, Axel, Eskofier, Bjoern M., Mutschler, Christopher
Many scenarios in mobility and traffic involve multiple different agents that need to cooperate to find a joint solution. Recent advances in behavioral planning use Reinforcement Learning to find effective and performant behavior strategies. However, as autonomous vehicles and vehicle-to-X communications become more mature, solutions that only utilize single, independent agents leave potential performance gains on the road. Multi-Agent Reinforcement Learning (MARL) is a research field that aims to find optimal solutions for multiple agents that interact with each other. This work aims to give an overview of the field to researchers in autonomous mobility. We first explain MARL and introduce important concepts. Then, we discuss the central paradigms that underlie MARL algorithms, and give an overview of state-of-the-art methods and ideas in each paradigm. With this background, we survey applications of MARL in autonomous mobility scenarios and give an overview of existing scenarios and implementations.
Decomposing your complex AI problem: Hierarchy
Problem worlds often come with an innate hierarchy. Naturally, this may prompt the question: which level(s) of the hierarchy should be modelled? For example, the US Stock Market can be modelled as a whole or at the index level -- think, the Dow Jones, or for individual stocks. In a linear system, the way that the lower levels interact with the upper levels is "linear" or directly correlated. Take the example of an analytics system for business intelligence and reporting -- sales, inventories, etc.
Optimal and Bounded-Suboptimal Multi-Goal Task Assignment and Path Finding
Zhong, Xinyi, Li, Jiaoyang, Koenig, Sven, Ma, Hang
We formalize and study the multi-goal task assignment and path finding (MG-TAPF) problem from theoretical and algorithmic perspectives. The MG-TAPF problem is to compute an assignment of tasks to agents, where each task consists of a sequence of goal locations, and collision-free paths for the agents that visit all goal locations of their assigned tasks in sequence. Theoretically, we prove that the MG-TAPF problem is NP-hard to solve optimally. We present algorithms that build upon algorithmic techniques for the multi-agent path finding problem and solve the MG-TAPF problem optimally and bounded-suboptimally. We experimentally compare these algorithms on a variety of different benchmark domains.
Proportional Fair Division of Multi-layered Cakes
In our daily lives, there are many examples of time scheduling whereby we set our time in such a manner that we can do our daily essential work. Consider a group of students of a university who want to use multiple facilities, such as a seminar room or an indoor games room. The start and closed times of the two facilities are the same. Each student of the group has a different preferred time duration to get each facility room and everyone is also willing to take both facilities. The problem of fair division of a divisible heterogenous resource among different agents with their different preferences over divisible resource has been studied in the classical cake cutting model (Steinhaus [1949],Brams and Jones [2006],Procaccia [2013],Kurokawa et al. [2013]). We assume a cake as the unit interval [0, 1] that is divided among agents where each agents has different perference over the entire cake. The concept of fair division was given by the three mathematicians, Hugo Steinhaus, Bronis law Knaster and Stefan Banach to a meeting in the Scottish Cafe in Lvov (in Poland) to the end of the world war II.
Constrained multi-agent ergodic area surveying control based on finite element approximation of the potential field
Ivić, Stefan, Sikirica, Ante, Crnković, Bojan
Heat Equation Driven Area Coverage (HEDAC) is a state-of-the-art multi-agent ergodic motion control guided by a gradient of a potential field. A finite element method is hereby implemented to obtain a solution of the Helmholtz partial differential equation, which models the potential field for surveying motion control. This allows us to survey arbitrarily shaped domains and to include obstacles in an elegant and robust manner intrinsic to HEDAC's fundamental idea. For a simple kinematic motion, the obstacles and boundary avoidance constraints are successfully handled by directing the agent motion with the gradient of the potential. However, including additional constraints, such as the minimal clearance distance from stationary and moving obstacles and the minimal path curvature radius, requires further alternations of the control algorithm. We introduce a relatively simple yet robust approach for handling these constraints by formulating a straightforward optimization problem based on collision-free escape route maneuvers. This approach provides a guaranteed collision avoidance mechanism while being computationally inexpensive as a result of the optimization problem partitioning. The proposed motion control is evaluated in three realistic surveying scenarios simulations, showing the effectiveness of the surveying and the robustness of the control algorithm. Furthermore, potential maneuvering difficulties due to improperly defined surveying scenarios are highlighted and we provide guidelines on how to overpass them. The results are promising and indicate real-world applicability of the proposed constrained multi-agent motion control for autonomous surveying and potentially other HEDAC utilizations.