Agents
Decentralized Coordination in Partially Observable Queueing Networks
Jia, Jiekai, Tahir, Anam, Koeppl, Heinz
We consider communication in a fully cooperative multi-agent system, where the agents have partial observation of the environment and must act jointly to maximize the overall reward. We have a discrete-time queueing network where agents route packets to queues based only on the partial information of the current queue lengths. The queues have limited buffer capacity, so packet drops happen when they are sent to a full queue. In this work, we implemented a communication channel for the agents to share their information in order to reduce the packet drop rate. For efficient information sharing we use an attention-based communication model, called ATVC, to select informative messages from other agents. The agents then infer the state of queues using a combination of the variational auto-encoder, VAE, and product-of-experts, PoE, model. Ultimately, the agents learn what they need to communicate and with whom, instead of communicating all the time with everyone. We also show empirically that ATVC is able to infer the true state of the queues and leads to a policy which outperforms existing baselines.
UWB Role Allocation with Distributed Ledger Technologies for Scalable Relative Localization in Multi-Robot Systems
Morón, Paola Torrico, Salimi, Salma, Queralta, Jorge Peña, Westerlund, Tomi
Systems for relative localization in multi-robot systems based on ultra-wideband (UWB) ranging have recently emerged as robust solutions for GNSS-denied environments. Scalability remains one of the key challenges, particularly in ad-hoc deployments. Recent solutions include dynamic allocation of active and passive localization modes for different robots or nodes in the system. With larger-scale systems becoming more distributed, key research questions arise in the areas of security and trustability of such localization systems. This paper studies the potential integration of collaborative-decision making processes with distributed ledger technologies. Specifically, we investigate the design and implementation of a methodology for running an UWB role allocation algorithm within smart contracts in a blockchain. In previous works, we have separately studied the integration of ROS2 with the Hyperledger Fabric blockchain, and introduced a new algorithm for scalable UWB-based localization. In this paper, we extend these works by (i) running experiments with larger number of mobile robots switching between different spatial configurations and (ii) integrating the dynamic UWB role allocation algorithm into Fabric smart contracts for distributed decision-making in a system of multiple mobile robots. This enables us to deliver the same functionality within a secure and trustable process, with enhanced identity and data access management. Our results show the effectiveness of the UWB role allocation for continuously varying spatial formations of six autonomous mobile robots, while demonstrating a low impact on latency and computational resources of adding the blockchain layer that does not affect the localization process.
Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
Cen, Shicong, Chen, Fan, Chi, Yuejie
A major challenge in multi-agent systems is that the system complexity grows dramatically with the number of agents as well as the size of their action spaces, which is typical in real world scenarios such as autonomous vehicles, robotic teams, network routing, etc. It is hence in imminent need to design decentralized or independent algorithms where the update of each agent is only based on their local observations without the need of introducing complex communication/coordination mechanisms. In this work, we study the finite-time convergence of independent entropy-regularized natural policy gradient (NPG) methods for potential games, where the difference in an agent's utility function due to unilateral deviation matches exactly that of a common potential function. The proposed entropy-regularized NPG method enables each agent to deploy symmetric, decentralized, and multiplicative updates according to its own payoff. We show that the proposed method converges to the quantal response equilibrium (QRE) -- the equilibrium to the entropy-regularized game -- at a sublinear rate, which is independent of the size of the action space and grows at most sublinearly with the number of agents. Appealingly, the convergence rate further becomes independent with the number of agents for the important special case of identical-interest games, leading to the first method that converges at a dimension-free rate. Our approach can be used as a smoothing technique to find an approximate Nash equilibrium (NE) of the unregularized problem without assuming that stationary policies are isolated.
Categorical semantics of compositional reinforcement learning
Bakirtzis, Georgios, Savvas, Michail, Topcu, Ufuk
Reinforcement learning (RL) often requires decomposing a problem into subtasks and composing learned behaviors on these tasks. Compositionality in RL has the potential to create modular subtask units that interface with other system capabilities. However, generating compositional models requires the characterization of minimal assumptions for the robustness of the compositional feature. We develop a framework for a \emph{compositional theory} of RL using a categorical point of view. Given the categorical representation of compositionality, we investigate sufficient conditions under which learning-by-parts results in the same optimal policy as learning on the whole. In particular, our approach introduces a category $\mathsf{MDP}$, whose objects are Markov decision processes (MDPs) acting as models of tasks. We show that $\mathsf{MDP}$ admits natural compositional operations, such as certain fiber products and pushouts. These operations make explicit compositional phenomena in RL and unify existing constructions, such as puncturing hazardous states in composite MDPs and incorporating state-action symmetry. We also model sequential task completion by introducing the language of zig-zag diagrams that is an immediate application of the pushout operation in $\mathsf{MDP}$.
Why Commonsense Knowledge is not (and can not be) Learned
Commonsense (background) knowledge, at least the kind of knowledge that we fetch and relay upon in the process of language understanding: (i) cannot be learned by processing vast amounts of text because that knowledge is never explicitly stated in the text -- and you cannot find what's not there; and (ii) that background knowledge cannot be learned perceptually from observation since the vast amount of the crucial background knowledge is universal, is not probablistic nor approximate, and so it cannot be susceptible to individual observations. The shared background knowledge needed in the process of language understanding is the kind of knowledge that obeys and respects the laws of nature and as such it has to be codified. In fact, that knowledge must be codified in a symbolic system that quantifies over variables of specific ontological types. There's a consensus among researchers investigating the neurological, psychological and evolutionary aspects of human linguistic communication that languages have evolved according to the information-theoretic principle of least effort. Specifically, it has been established that interacting communicative agents tend to produce utterances that minimize the complexity of coding a thought as well as minimize the process of decoding linguistic utterances back to the intended thought [1] -- thus finding an optimal point where the effort of both speaker and listener is minimal.
Benchmark Results for Bookshelf Organization Problem as Mixed Integer Nonlinear Program with Mode Switch and Collision Avoidance
Lin, Xuan, Fernandez, Gabriel I., Hong, Dennis W.
Mixed integer convex and nonlinear programs, MICP and MINLP, are expressive but require long solving times. Recent work that combines data-driven methods on solver heuristics has shown potential to overcome this issue allowing for applications on larger scale practical problems. To solve mixed-integer bilinear programs online with data-driven methods, several formulations exist including mathematical programming with complementary constraints (MPCC), mixed-integer programming (MIP). In this work, we benchmark the performances of those data-driven schemes on a bookshelf organization problem that has discrete mode switch and collision avoidance constraints. The success rate, optimal cost and solving time are compared along with non-data-driven methods. Our proposed methods are demonstrated as a high level planner for a robotic arm for the bookshelf problem.
Stag hunt game-based approach for cooperative UAVs
Nguyen, L. V., Herrera, I. Torres, Le, T. H., Phung, M. D., Aguilera, R. P., Ha, Q. P.
Unmanned aerial vehicles (UAVs) are being employed in many areas such as photography, emergency, entertainment, defence, agriculture, forestry, mining and construction. Over the last decade, UAV technology has found applications in numerous construction project phases, ranging from site mapping, progress monitoring, building inspection, damage assessments, and material delivery. While extensive studies have been conducted on the advantages of UAVs for various construction-related processes, studies on UAV collaboration to improve the task capacity and efficiency are still scarce. This paper proposes a new cooperative path planning algorithm for multiple UAVs based on the stag hunt game and particle swarm optimization (PSO). First, a cost function for each UAV is defined, incorporating multiple objectives and constraints. The UAV game framework is then developed to formulate the multi-UAV path planning into the problem of finding payoff-dominant equilibrium. Next, a PSO-based algorithm is proposed to obtain optimal paths for the UAVs. Simulation results for a large construction site inspected by three UAVs indicate the effectiveness of the proposed algorithm in generating feasible and efficient flight paths for UAV formation during the inspection task.
Opinion Leader Detection in Online Social Networks Based on Output and Input Links
Ghorbani, Zahra, Khasteh, Seyed Hossein, Ghafouri, Saeid
The understanding of how users in a network update their opinions based on their neighbours opinions has attracted a great deal of interest in the field of network science, and a growing body of literature recognises the significance of this issue. In this research paper, we propose a new dynamic model of opinion formation in directed networks. In this model, the opinion of each node is updated as the weighted average of its neighbours opinions, where the weights represent social influence. We define a new centrality measure as a social influence metric based on both influence and conformity. We measure this new approach using two opinion formation models: (i) the Degroot model and (ii) our own proposed model. Previously published research studies have not considered conformity, and have only considered the influence of the nodes when computing the social influence. In our definition, nodes with low in-degree and high out-degree that were connected to nodes with high out-degree and low in-degree had higher centrality. As the main contribution of this research, we propose an algorithm for finding a small subset of nodes in a social network that can have a significant impact on the opinions of other nodes. Experiments on real-world data demonstrate that the proposed algorithm significantly outperforms previously published state-of-the-art methods.
Pricing Problems with Buyer Preselection
Bilò, Vittorio | Flammini, Michele | Monaco, Gianpiero | Moscardelli, Luca (a:1:{s:5:"en_US";s:28:"University of Chieti-Pescara";})
We investigate the problem of preselecting a subset of buyers (also called agents) participating in a market so as to optimize the performance of stable outcomes. We consider four scenarios arising from the combination of two stability notions, namely market envy-freeness and agent envy-freeness, with the two state-of-the-art objective functions of social welfare and seller’s revenue. When insisting on market envy-freeness, we prove that the problem cannot be approximated within n 1−ε (with n being the number of buyers) for any ε > 0, under both objective functions; we also provide approximation algorithms with an approximation ratio tight up to subpolynomial multiplicative factors for social welfare and the seller’s revenue. The negative result, in particular, holds even for markets with single-minded buyers. We also prove that maximizing the seller’s revenue is NP-hard even for a single buyer, thus closing a previous open question. Under agent envy-freeness and for both objective functions, instead, we design a polynomial time algorithm transforming any stable outcome for a market involving any subset of buyers into a stable outcome for the whole market without worsening its performance. This result creates an interesting middle-ground situation where, if on the one hand buyer preselection cannot improve the performance of agent envy-free outcomes, on the other one it can be used as a tool for simplifying the combinatorial structure of the buyers’ valuation functions in a given market. Finally, we consider the restricted case of multi-unit markets, where all items are of the same type and are assigned the same price. For these markets, we show that preselection may improve the performance of stable outcomes in all of the four considered scenarios, and design corresponding approximation algorithms.
The History of AI Rights Research
This report documents the history of research on AI rights and other moral consideration of artificial entities. It highlights key intellectual influences on this literature as well as research and academic discussion addressing the topic more directly. We find that researchers addressing AI rights have often seemed to be unaware of the work of colleagues whose interests overlap with their own. Academic interest in this topic has grown substantially in recent years; this reflects wider trends in academic research, but it seems that certain influential publications, the gradual, accumulating ubiquity of AI and robotic technology, and relevant news events may all have encouraged increased academic interest in this specific topic. We suggest four levers that, if pulled on in the future, might increase interest further: the adoption of publication strategies similar to those of the most successful previous contributors; increased engagement with adjacent academic fields and debates; the creation of specialized journals, conferences, and research institutions; and more exploration of legal rights for artificial entities.