Agents
Exploration and Coverage with Swarms of Settling Agents
Rappel, Ori, Ben-Asher, Joseph, Bruckstein, Alfred
We consider several algorithms for exploring and filling an unknown, connected region, by simple, airborne agents. The agents are assumed to be identical, autonomous, anonymous and to have a finite amount of memory. The region is modeled as a connected sub-set of a regular grid composed of square cells. The algorithms described herein are suited for Micro Air Vehicles (MAV) since these air vehicles enable unobstructed views of the ground below and can move freely in space at various heights. The agents explore the region by applying various action-rules based on locally acquired information Some of them may settle in unoccupied cells as the exploration progresses. Settled agents become virtual pheromones for the exploration and coverage process, beacons that subsequently aid the remaining, and still exploring, mobile agents. We introduce a backward propagating information diffusion process as a way to implement a deterministic indicator of process termination and guide the mobile agents. For the proposed algorithms, complete covering of the graph in finite time is guaranteed when the size of the region is fixed. Bounds on the coverage times are also derived. Extensive simulation results exhibit good agreement with the theoretical predictions.
Introducing emotions in the reasoning cycle ofnormative aware agents
Perez, Daniel, Argente, Estefania, Del Val, Elena, Valero, Soledad
Human relationships are complex processes that often involve following certain rules that regulate interactions and/or expected outcomes. These rules may be imposed by an authority or established by society. In multi-agent systems, normative systems have extensively addressed aspects such as norm synthesis, norm conflict detection, as well as norm emergence. However, if human behaviour is to be adequately simulated, not only normative aspects but also emotional aspects have to be taken into account. In this paper, we propose a Jason agent architecture that incorporates norms and emotions in its reasoning process to determine which plan (actions) to execute. The proposal is evaluated through a scenario based on a social network, which allows us to analyse the benefits of using emotional normative agents to achieve simulations closer to real human world.
Synthesis of Cost-Optimal Multi-Agent Systems for Resource Allocation
Multi-agent systems for resource allocation (MRAs) have been introduced as a concept for modelling competitive resource allocation problems in distributed computing. An MRA is composed of a set of agents and a set of resources. Each agent has goals in terms of allocating certain resources. For MRAs it is typically of importance that they are designed in a way such that there exists a strategy that guarantees that all agents will achieve their goals. The corresponding model checking problem is to determine whether such a winning strategy exists or not, and the synthesis problem is to actually build the strategy. While winning strategies ensure that all goals will be achieved, following such strategies does not necessarily involve an optimal use of resources. In this paper, we present a technique that allows to synthesise cost-optimal solutions to distributed resource allocation problems. We consider a scenario where system components such as agents and resources involve costs. A multi-agent system shall be designed that is cost-minimal but still capable of accomplishing a given set of goals. Our approach synthesises a winning strategy that minimises the cumulative costs of the components that are required for achieving the goals. The technique is based on a propositional logic encoding and a reduction of the synthesis problem to the maximum satisfiability problem (Max-SAT). Hence, a Max-SAT solver can be used to perform the synthesis. From a truth assignment that maximises the number of satisfied clauses of the encoding a cost-optimal winning strategy as well as a cost-optimal system can be immediately derived.
Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning
Hong, Yitian, Jin, Yaochu, Tang, Yang
In cooperative multi-agent reinforcement learning, centralized training and decentralized execution (CTDE) has achieved remarkable success. Individual Global Max (IGM) decomposition, which is an important element of CTDE, measures the consistency between local and joint policies. The majority of IGM-based research focuses on how to establish this consistent relationship, but little attention has been paid to examining IGM's potential flaws. In this work, we reveal that the IGM condition is a lossy decomposition, and the error of lossy decomposition will accumulated in hypernetwork-based methods. To address the above issue, we propose to adopt an imitation learning strategy to separate the lossy decomposition from Bellman iterations, thereby avoiding error accumulation. The proposed strategy is theoretically proved and empirically verified on the StarCraft Multi-Agent Challenge benchmark problem with zero sight view. The results also confirm that the proposed method outperforms state-of-the-art IGM-based approaches.
Analysis Of The Anytime MAPF Solvers Based On The Combination Of Conflict-Based Search (CBS) and Focal Search (FS)
Ivanashev, Ilya, Andreychuk, Anton, Yakovlev, Konstantin
Conflict-Based Search (CBS) is a widely used algorithm for solving multi-agent pathfinding (MAPF) problems optimally. The core idea of CBS is to run hierarchical search, when, on the high level the tree of solutions candidates is explored, and on the low-level an individual planning for a specific agent (subject to certain constraints) is carried out. To trade-off optimality for running time different variants of bounded sub-optimal CBS were designed, which alter both high- and low-level search routines of CBS. Moreover, anytime variant of CBS does exist that applies Focal Search (FS) to the high-level of CBS - Anytime BCBS. However, no comprehensive analysis of how well this algorithm performs compared to the naive one, when we simply re-invoke CBS with the decreased sub-optimality bound, was present. This work aims at filling this gap. Moreover, we present and evaluate another anytime version of CBS that uses FS on both levels of CBS. Empirically, we show that its behavior is principally different from the one demonstrated by Anytime BCBS. Finally, we compare both algorithms head-to-head and show that using Focal Search on both levels of CBS can be beneficial in a wide range of setups.
Altruistic Hedonic Games
Kerkmann, Anna Maria, Nguyen, Nhan-Tam, Rey, Anja, Rey, Lisa, Rothe, Jรถrg, Schend, Lena, Wiechers, Alessandra
Hedonic games are coalition formation games in which players have preferences over the coalitions they can join. For a long time, all models of representing hedonic games were based upon selfish players only. Among the known ways of representing hedonic games compactly, we focus on friend-oriented hedonic games and propose a novel model for them that takes into account not only the players' own preferences but also their friends' preferences. Depending on the order in which players look at their own or their friends' preferences, we distinguish three degrees of altruism: selfish-first, equal-treatment, and altruistic-treatment preferences. We study both the axiomatic properties of these games and the computational complexity of problems related to various common stability concepts.
Motion Planning Under Uncertainty with Complex Agents and Environments via Hybrid Search
Strawser, Daniel (a:1:{s:5:"en_US";s:3:"MIT";}) | Williams, Brian (MIT)
As autonomous systems and robots are applied to more real world situations, they must reason about uncertainty when planning actions. Mission success oftentimes cannot be guaranteed and the planner must reason about the probability of failure. Unfortunately, computing a trajectory that satisfies mission goals while constraining the probability of failure is difficult because of the need to reason about complex, multidimensional probability distributions. Recent methods have seen success using chance-constrained, model-based planning. However, the majority of these methods can only handle simple environment and agent models. We argue that there are two main drawbacks of current approaches to goal-directed motion planning under uncertainty. First, current methods suffer from an inability to deal with expressive environment models such as 3D non-convex obstacles. Second, most planners rely on considerable simplifications when computing trajectory risk including approximating the agentโs dynamics, geometry, and uncertainty. In this article, we apply hybrid search to the risk-bound, goal-directed planning problem. The hybrid search consists of a region planner and a trajectory planner. The region planner makes discrete choices by reasoning about geometric regions that the autonomous agent should visit in order to accomplish its mission. In formulating the region planner, we propose landmark regions that help produce obstacle-free paths. The region planner passes paths through the environment to a trajectory planner; the task of the trajectory planner is to optimize trajectories that respect the agentโs dynamics and the userโs desired risk of mission failure. We discuss three approaches to modeling trajectory risk: a CDF-based approach, a sampling-based collocation method, and an algorithm named Shooting Method Monte Carlo. These models allow computation of trajectory risk with more complex environments, agent dynamics, geometries, and models of uncertainty than past approaches. A variety of 2D and 3D test cases are presented including a linear case, a Dubins car model, and an underwater autonomous vehicle. The method is shown to outperform other methods in terms of speed and utility of the solution. Additionally, the models of trajectory risk are shown to better approximate risk in simulation.
Relative Transformation Estimation Based on Fusion of Odometry and UWB Ranging Data
Nguyen, Thien Hoang, Xie, Lihua
In this work, the problem of 4 degree-of-freedom (3D position and heading) robot-to-robot relative frame transformation estimation using onboard odometry and inter-robot distance measurements is studied. Firstly, we present a theoretical analysis of the problem, namely the derivation and interpretation of the Cramer-Rao Lower Bound (CRLB), the Fisher Information Matrix (FIM) and its determinant. Secondly, we propose optimization-based methods to solve the problem, including a quadratically constrained quadratic programming (QCQP) and the corresponding semidefinite programming (SDP) relaxation. Moreover, we address practical issues that are ignored in previous works, such as accounting for spatial-temporal offsets between the ultra-wideband (UWB) and odometry sensors, rejecting UWB outliers and checking for singular configurations before commencing operation. Lastly, extensive simulations and real-life experiments with aerial robots show that the proposed QCQP and SDP methods outperform state-of-the-art methods, especially in geometrically poor or large measurement noise conditions. In general, the QCQP method provides the best results at the expense of computational time, while the SDP method runs much faster and is sufficiently accurate in most cases.
Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations
Seo, Sangwon, Unhelkar, Vaibhav V.
We present Bayesian Team Imitation Learner (BTIL), an imitation learning algorithm to model the behavior of teams performing sequential tasks in Markovian domains. In contrast to existing multi-agent imitation learning techniques, BTIL explicitly models and infers the time-varying mental states of team members, thereby enabling learning of decentralized team policies from demonstrations of suboptimal teamwork. Further, to allow for sample- and label-efficient policy learning from small datasets, BTIL employs a Bayesian perspective and is capable of learning from semi-supervised demonstrations. We demonstrate and benchmark the performance of BTIL on synthetic multi-agent tasks as well as a novel dataset of human-agent teamwork. Our experiments show that BTIL can successfully learn team policies from demonstrations despite the influence of team members' (time-varying and potentially misaligned) mental states on their behavior.
Compressed Particle-Based Federated Bayesian Learning and Unlearning
Gong, Jinu, Simeone, Osvaldo, Kang, Joonhyuk
Conventional frequentist FL schemes are known to yield overconfident decisions. Bayesian FL addresses this issue by allowing agents to process and exchange uncertainty information encoded in distributions over the model parameters. However, this comes at the cost of a larger per-iteration communication overhead. This letter investigates whether Bayesian FL can still provide advantages in terms of calibration when constraining communication bandwidth. We present compressed particle-based Bayesian FL protocols for FL and federated "unlearning" that apply quantization and sparsification across multiple particles. The experimental results confirm that the benefits of Bayesian FL are robust to bandwidth constraints.