Agents
Roboat: autonomous boats in Amsterdam – how AI driven autonomous systems will work
The system is now advanced in accuracy. Roboat II autonomously navigated the canals of Amsterdam for three hours collecting data and returned back to its start location with an error margin of only 0.17 meters, or fewer than 7 inches. There are now advanced navigation and control algorithms for communication and collaboration between boats. The system is modeled on an ant colony using a distributed controller. In this model, there is no direct communication among the connected robots -- only one leader knows the destination.
On the Online Coalition Structure Generation Problem
Flammini, Michele, Monaco, Gianpiero, Moscardelli, Luca, Shalom, Mordechai, Zaks, Shmuel
We consider the online version of the coalition structure generation problem, in which agents, corresponding to the vertices of a graph, appear in an online fashion and have to be partitioned into coalitions by an authority (i.e., an online algorithm). When an agent appears, the algorithm has to decide whether to put the agent into an existing coalition or to create a new one containing, at this moment, only her. The decision is irrevocable. The objective is partitioning agents into coalitions so as to maximize the resulting social welfare that is the sum of all coalition values. We consider two cases for the value of a coalition: (1) the sum of the weights of its edges, and (2) the sum of the weights of its edges divided by its size. Coalition structures appear in a variety of application in AI, multi-agent systems, networks, as well as in social networks, data analysis, computational biology, game theory, and scheduling. For each of the coalition value functions we consider the bounded and unbounded cases depending on whether or not the size of a coalition can exceed a given value α. Furthermore, we consider the case of a limited number of coalitions and various weight functions for the edges, i.e., unrestricted, positive and constant weights. We show tight or nearly tight bounds for the competitive ratio in each case.
The Complexity of Data-Driven Norm Synthesis and Revision
Dell'Anna, Davide, Alechina, Natasha, Logan, Brian, Löffler, Maarten, Dalpiaz, Fabiano, Dastani, Mehdi
Norms have been widely proposed as a way of coordinating and controlling the activities of agents in a multi-agent system (MAS). A norm specifies the behaviour an agent should follow in order to achieve the objective of the MAS. However, designing norms to achieve a particular system objective can be difficult, particularly when there is no direct link between the language in which the system objective is stated and the language in which the norms can be expressed. In this paper, we consider the problem of synthesising a norm from traces of agent behaviour, where each trace is labelled with whether the behaviour satisfies the system objective. We show that the norm synthesis problem is NP-complete.
Intention Recognition for Multiple Agents
Zhang, Zhang, Zeng, Yifeng, Chen, Yingke
Intention recognition is an important step to facilitate collaboration in multi-agent systems. Existing work mainly focuses on intention recognition in a single-agent setting and uses a descriptive model, e.g. Bayesian networks, in the recognition process. In this paper, we resort to a prescriptive approach to model agents' behaviour where which their intentions are hidden in implementing their plans. We introduce landmarks into the behavioural model therefore enhancing informative features for identifying common intentions for multiple agents. We further refine the model by focusing only action sequences in their plan and provide a light model for identifying and comparing their intentions. The new models provide a simple approach of grouping agents' common intentions upon partial plans observed in agents' interactions. We provide experimental results in support.
Renewable energy integration and microgrid energy trading using multi-agent deep reinforcement learning
Harrold, Daniel J. B., Cao, Jun, Fan, Zhong
In this paper, multi-agent reinforcement learning is used to control a hybrid energy storage system working collaboratively to reduce the energy costs of a microgrid through maximising the value of renewable energy and trading. The agents must learn to control three different types of energy storage system suited for short, medium, and long-term storage under fluctuating demand, dynamic wholesale energy prices, and unpredictable renewable energy generation. Two case studies are considered: the first looking at how the energy storage systems can better integrate renewable energy generation under dynamic pricing, and the second with how those same agents can be used alongside an aggregator agent to sell energy to self-interested external microgrids looking to reduce their own energy bills. This work found that the centralised learning with decentralised execution of the multi-agent deep deterministic policy gradient and its state-of-the-art variants allowed the multi-agent methods to perform significantly better than the control from a single global agent. It was also found that using separate reward functions in the multi-agent approach performed much better than using a single control agent. Being able to trade with the other microgrids, rather than just selling back to the utility grid, also was found to greatly increase the grid's savings.
Active Sensing for Search and Tracking: A Review
Varotto, Luca, Cenedese, Angelo, Cavallaro, Andrea
Active Position Estimation (APE) is the task of localizing one or more targets using one or more sensing platforms. APE is a key task for search and rescue missions, wildlife monitoring, source term estimation, and collaborative mobile robotics. Success in APE depends on the level of cooperation of the sensing platforms, their number, their degrees of freedom and the quality of the information gathered. APE control laws enable active sensing by satisfying either pure-exploitative or pure-explorative criteria. The former minimizes the uncertainty on position estimation; whereas the latter drives the platform closer to its task completion. In this paper, we define the main elements of APE to systematically classify and critically discuss the state of the art in this domain. We also propose a reference framework as a formalism to classify APE-related solutions. Overall, this survey explores the principal challenges and envisages the main research directions in the field of autonomous perception systems for localization tasks. It is also beneficial to promote the development of robust active sensing methods for search and tracking applications.
Efficient Pressure: Improving efficiency for signalized intersections
Wu, Qiang, Zhang, Liang, Shen, Jun, Lü, Linyuan, Du, Bo, Wu, Jianqing
Since conventional approaches could not adapt to dynamic traffic conditions, reinforcement learning (RL) has attracted more attention to help solve the traffic signal control (TSC) problem. However, existing RL-based methods are rarely deployed considering that they are neither cost-effective in terms of computing resources nor more robust than traditional approaches, which raises a critical research question: how to construct an adaptive controller for TSC with less training and reduced complexity based on RL-based approach? To address this question, in this paper, we (1) innovatively specify the traffic movement representation as a simple but efficient pressure of vehicle queues in a traffic network, namely efficient pressure (EP); (2) build a traffic signal settings protocol, including phase duration, signal phase number and EP for TSC; (3) design a TSC approach based on the traditional max pressure (MP) approach, namely efficient max pressure (Efficient-MP) using the EP to capture the traffic state; and (4) develop a general RL-based TSC algorithm template: efficient Xlight (Efficient-XLight) under EP. Through comprehensive experiments on multiple real-world datasets in our traffic signal settings' protocol for TSC, we demonstrate that efficient pressure is complementary to traditional and RL-based modeling to design better TSC methods. Our code is released on Github.
Web-Based Fault Diagnostic and Learning System - The International Journal of Advanced Manufacturing Technology
Web-based technology holds great potential for enabling the rapid dissemination of information and facilitating distributed decision-making. This paper presents a novel knowledge-based multi-agent system for remote fault diagnosis, which is composed of diagnostic and learning agents (DLAs), machine agents (MAs) and a central management agent (CMA). Machines are remotely diagnosed by the DLAs through the communication channels between the MAs and the DLAs. In addition, the DLAs can learn new expertise from the users, and the CMA can update the central knowledge base (CKB) shared by all the DLAs with the valuable expertise. When faults that cannot be solved with the present knowledge base occur, the DLA can acquire new knowledge, translate it into rules using a rule builder, and update the rules into the CKB.
Distributed Adaptive Learning Under Communication Constraints
Carpentiero, Marco, Matta, Vincenzo, Sayed, Ali H.
This work examines adaptive distributed learning strategies designed to operate under communication constraints. We consider a network of agents that must solve an online optimization problem from continual observation of streaming data. The agents implement a distributed cooperative strategy where each agent is allowed to perform local exchange of information with its neighbors. In order to cope with communication constraints, the exchanged information must be unavoidably compressed. We propose a diffusion strategy nicknamed as ACTC (Adapt-Compress-Then-Combine), which relies on the following steps: i) an adaptation step where each agent performs an individual stochastic-gradient update with constant step-size; ii) a compression step that leverages a recently introduced class of stochastic compression operators; and iii) a combination step where each agent combines the compressed updates received from its neighbors. The distinguishing elements of this work are as follows. First, we focus on adaptive strategies, where constant (as opposed to diminishing) step-sizes are critical to respond in real time to nonstationary variations. Second, we consider the general class of directed graphs and left-stochastic combination policies, which allow us to enhance the interplay between topology and learning. Third, in contrast with related works that assume strong convexity for all individual agents' cost functions, we require strong convexity only at a network level, a condition satisfied even if a single agent has a strongly-convex cost and the remaining agents have non-convex costs. Fourth, we focus on a diffusion (as opposed to consensus) strategy. Under the demanding setting of compressed information, we establish that the ACTC iterates fluctuate around the desired optimizer, achieving remarkable savings in terms of bits exchanged between neighboring agents.
Chronological Causal Bandits
This paper studies an instance of the multi-armed bandit (MAB) problem, specifically where several causal MABs operate chronologically in the same dynamical system. Practically the reward distribution of each bandit is governed by the same non-trivial dependence structure, which is a dynamic causal model. Dynamic because we allow for each causal MAB to depend on the preceding MAB and in doing so are able to transfer information between agents. Our contribution, the Chronological Causal Bandit (CCB), is useful in discrete decision-making settings where the causal effects are changing across time and can be informed by earlier interventions in the same system. In this paper, we present some early findings of the CCB as demonstrated on a toy problem.