Goto

Collaborating Authors

 Agents


ALMA: Hierarchical Learning for Composite Multi-Agent Tasks

arXiv.org Artificial Intelligence

Despite significant progress on multi-agent reinforcement learning (MARL) in recent years, coordination in complex domains remains a challenge. Work in MARL often focuses on solving tasks where agents interact with all other agents and entities in the environment; however, we observe that real-world tasks are often composed of several isolated instances of local agent interactions (subtasks), and each agent can meaningfully focus on one subtask to the exclusion of all else in the environment. In these composite tasks, successful policies can often be decomposed into two levels of decision-making: agents are allocated to specific subtasks and each agent acts productively towards their assigned subtask alone. This decomposed decision making provides a strong structural inductive bias, significantly reduces agent observation spaces, and encourages subtask-specific policies to be reused and composed during training, as opposed to treating each new composition of subtasks as unique. We introduce ALMA, a general learning method for taking advantage of these structured tasks. ALMA simultaneously learns a high-level subtask allocation policy and low-level agent policies. We demonstrate that ALMA learns sophisticated coordination behavior in a number of challenging environments, outperforming strong baselines. ALMA's modularity also enables it to better generalize to new environment configurations. Finally, we find that while ALMA can integrate separately trained allocation and action policies, the best performance is obtained only by training all components jointly.


Deep Attentive Belief Propagation: Integrating Reasoning and Learning for Solving Constraint Optimization Problems

arXiv.org Artificial Intelligence

Belief Propagation (BP) is an important message-passing algorithm for various reasoning tasks over graphical models, including solving the Constraint Optimization Problems (COPs). It has been shown that BP can achieve state-of-the-art performance on various benchmarks by mixing old and new messages before sending the new one, i.e., damping. However, existing methods of tuning a static damping factor for BP not only are laborious but also harm their performance. Moreover, existing BP algorithms treat each variable node's neighbors equally when composing a new message, which also limits their exploration ability. To address these issues, we seamlessly integrate BP, Gated Recurrent Units (GRUs), and Graph Attention Networks (GATs) within the message-passing framework to reason about dynamic weights and damping factors for composing new BP messages. Our model, Deep Attentive Belief Propagation (DABP), takes the factor graph and the BP messages in each iteration as the input and infers the optimal weights and damping factors through GRUs and GATs, followed by a multi-head attention layer. Furthermore, unlike existing neural-based BP variants, we propose a novel self-supervised learning algorithm for DABP with a smoothed solution cost, which does not require expensive training labels and also avoids the common out-of-distribution issue through efficient online learning. Extensive experiments show that our model significantly outperforms state-of-the-art baselines.


Online Allocation and Learning in the Presence of Strategic Agents

arXiv.org Artificial Intelligence

We study the problem of allocating $T$ sequentially arriving items among $n$ homogeneous agents under the constraint that each agent must receive a pre-specified fraction of all items, with the objective of maximizing the agents' total valuation of items allocated to them. The agents' valuations for the item in each round are assumed to be i.i.d. but their distribution is a priori unknown to the central planner. Therefore, the central planner needs to implicitly learn these distributions from the observed values in order to pick a good allocation policy. However, an added challenge here is that the agents are strategic with incentives to misreport their valuations in order to receive better allocations. This sets our work apart both from the online auction design settings which typically assume known valuation distributions and/or involve payments, and from the online learning settings that do not consider strategic agents. To that end, our main contribution is an online learning based allocation mechanism that is approximately Bayesian incentive compatible, and when all agents are truthful, guarantees a sublinear regret for individual agents' utility compared to that under the optimal offline allocation policy.


Learn what matters: cross-domain imitation learning with task-relevant embeddings

arXiv.org Artificial Intelligence

We study how an autonomous agent learns to perform a task from demonstrations in a different domain, such as a different environment or different agent. Such cross-domain imitation learning is required to, for example, train an artificial agent from demonstrations of a human expert. We propose a scalable framework that enables cross-domain imitation learning without access to additional demonstrations or further domain knowledge. We jointly train the learner agent's policy and learn a mapping between the learner and expert domains with adversarial training. We effect this by using a mutual information criterion to find an embedding of the expert's state space that contains task-relevant information and is invariant to domain specifics. This step significantly simplifies estimating the mapping between the learner and expert domains and hence facilitates end-to-end learning. We demonstrate successful transfer of policies between considerably different domains, without extra supervision such as additional demonstrations, and in situations where other methods fail.


Stabilizability of multi-agent systems under event-triggered controllers

arXiv.org Artificial Intelligence

In view of the problems of large consumption of communication and computing resources in the control process, this note studies a fundamental property for a class of multi-agent systems under event-triggered strategy: the S-stabilizability of a group of multi-agent systems with general linear dynamics under weakly connected directed topology. The results indicate that the S-stabilizability can be described in some way that the stabilizability region and feedback gain can evaluate the performance of the protocol. Firstly, a new distributed event-triggered protocol is proposed. Under this protocol, a kind of hybrid static and dynamic event-triggered strategy are presented, respectively. In particular, by using Lyapunov stability theory and graph partition tool, it is proved that the proposed event-triggered control strategy can guarantee the closed-loop system achieve S-stabilizability effectively, if at least one vertex in each iSCC cell receives information from the leader, which reflects the ability of distributed control law. Further, we demonstrate that the stabilizability can be realized if the initial system matrix A is Hurwitz. Moreover, it is confirmed that the designed static event-triggered condition is a limit case of dynamic event condition and can guarantee Zeno-free behavior. Finally, the validity of the theoretical results is proved by numerical simulation.


Cooperative Tuning of Multi-Agent Optimal Control Systems

arXiv.org Artificial Intelligence

This paper investigates the problem of cooperative tuning of multi-agent optimal control systems, where a network of agents (i.e. multiple coupled optimal control systems) adjusts parameters in their dynamics, objective functions, or controllers in a coordinated way to minimize the sum of their loss functions. Different from classical techniques for tuning parameters in a controller, we allow tunable parameters appearing in both the system dynamics and the objective functions of each agent. A framework is developed to allow all agents to reach a consensus on the tunable parameter, which minimizes team loss. The key idea of the proposed algorithm rests on the integration of consensus-based distributed optimization for a multi-agent system and a gradient generator capturing the optimal performance as a function of the parameter in the feedback loop tuning the parameter for each agent. Both theoretical results and simulations for a synchronous multi-agent rendezvous problem are provided to validate the proposed method for cooperative tuning of multi-agent optimal control.


A Constraint-Driven Approach to Line Flocking: The V Formation as an Energy-Saving Strategy

arXiv.org Artificial Intelligence

The study of robotic flocking has received significant attention in the past twenty years. In this article, we present a constraint-driven control algorithm that minimizes the energy consumption of individual agents and yields an emergent V formation. As the formation emerges from the decentralized interaction between agents, our approach is robust to the spontaneous addition or removal of agents to the system. First, we present an analytical model for the trailing upwash behind a fixed-wing UAV, and we derive the optimal air speed for trailing UAVs to maximize their travel endurance. Next, we prove that simply flying at the optimal airspeed will never lead to emergent flocking behavior, and we propose a new decentralized "anseroid" behavior that yields emergent V formations. We encode these behaviors in a constraint-driven control algorithm that minimizes the locomotive power of each UAV. Finally, we prove that UAVs initialized in an approximate V or echelon formation will converge under our proposed control law, and we demonstrate this emergence occurs in real-time in simulation and in physical experiments with a fleet of Crazyflie quadrotors.


Seamlessly Integrating Factual Information and Social Content with Persuasive Dialogue

arXiv.org Artificial Intelligence

Complex conversation settings such as persuasion involve communicating changes in attitude or behavior, so users' perspectives need to be addressed, even when not directly related to the topic. In this work, we contribute a novel modular dialogue system framework that seamlessly integrates factual information and social content into persuasive dialogue. Our framework is generalizable to any dialogue tasks that have mixed social and task contents. We conducted a study that compared user evaluations of our framework versus a baseline end-to-end generation model. We found our framework was evaluated more favorably in all dimensions including competence and friendliness, compared to the end-to-end model which does not explicitly handle social content or factual questions.


An Efficient Algorithm for Fair Multi-Agent Multi-Armed Bandit with Low Regret

arXiv.org Artificial Intelligence

Recently a multi-agent variant of the classical multi-armed bandit was proposed to tackle fairness issues in online learning. Inspired by a long line of work in social choice and economics, the goal is to optimize the Nash social welfare instead of the total utility. Unfortunately previous algorithms either are not efficient or achieve sub-optimal regret in terms of the number of rounds $T$. We propose a new efficient algorithm with lower regret than even previous inefficient ones. For $N$ agents, $K$ arms, and $T$ rounds, our approach has a regret bound of $\tilde{O}(\sqrt{NKT} + NK)$. This is an improvement to the previous approach, which has regret bound of $\tilde{O}( \min(NK, \sqrt{N} K^{3/2})\sqrt{T})$. We also complement our efficient algorithm with an inefficient approach with $\tilde{O}(\sqrt{KT} + N^2K)$ regret. The experimental findings confirm the effectiveness of our efficient algorithm compared to the previous approaches.


FedVLN: Privacy-preserving Federated Vision-and-Language Navigation

arXiv.org Artificial Intelligence

Data privacy is a central problem for embodied agents that can perceive the environment, communicate with humans, and act in the real world. While helping humans complete tasks, the agent may observe and process sensitive information of users, such as house environments, human activities, etc. In this work, we introduce privacy-preserving embodied agent learning for the task of Vision-and-Language Navigation (VLN), where an embodied agent navigates house environments by following natural language instructions. We view each house environment as a local client, which shares nothing other than local updates with the cloud server and other clients, and propose a novel federated vision-and-language navigation (FedVLN) framework to protect data privacy during both training and pre-exploration. Particularly, we propose a decentralized training strategy to limit the data of each client to its local model training and a federated pre-exploration method to do partial model aggregation to improve model generalizability to unseen environments. Extensive results on R2R and RxR datasets show that under our FedVLN framework, decentralized VLN models achieve comparable results with centralized training while protecting seen environment privacy, and federated pre-exploration significantly outperforms centralized pre-exploration while preserving unseen environment privacy.