Goto

Collaborating Authors

 Agents


Adaptation Strategy for a Distributed Autonomous UAV Formation in Case of Aircraft Loss

arXiv.org Artificial Intelligence

Controlling a distributed autonomous unmanned aerial vehicle (UAV) formation is usually considered in the context of recovering the connectivity graph should a single UAV agent be lost. At the same time, little focus is made on how such loss affects the dynamics of the formation as a system. To compensate for the negative effects, we propose an adaptation algorithm that reduces the increasing interaction between the UAV agents that remain in the formation. This algorithm enables the autonomous system to adjust to the new equilibrium state. The algorithm has been tested by computer simulation on full nonlinear UAV models. Simulation results prove the negative effect (the increased final cruising speed of the formation) to be completely eliminated.


Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

arXiv.org Artificial Intelligence

We examine global non-asymptotic convergence properties of policy gradient methods for multi-agent reinforcement learning (RL) problems in Markov potential games (MPG). To learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large, we propose new independent policy gradient algorithms that are run by all players in tandem. When there is no uncertainty in the gradient evaluation, we show that our algorithm finds an $\epsilon$-Nash equilibrium with $O(1/\epsilon^2)$ iteration complexity which does not explicitly depend on the state space size. When the exact gradient is not available, we establish $O(1/\epsilon^5)$ sample complexity bound in a potentially infinitely large state space for a sample-based algorithm that utilizes function approximation. Moreover, we identify a class of independent policy gradient algorithms that enjoys convergence for both zero-sum Markov games and Markov cooperative games with the players that are oblivious to the types of games being played. Finally, we provide computational experiments to corroborate the merits and the effectiveness of our theoretical developments.


Reasoning about Counterfactuals to Improve Human Inverse Reinforcement Learning

arXiv.org Artificial Intelligence

To collaborate well with robots, we must be able to understand their decision making. Humans naturally infer other agents' beliefs and desires by reasoning about their observable behavior in a way that resembles inverse reinforcement learning (IRL). Thus, robots can convey their beliefs and desires by providing demonstrations that are informative for a human learner's IRL. An informative demonstration is one that differs strongly from the learner's expectations of what the robot will do given their current understanding of the robot's decision making. However, standard IRL does not model the learner's existing expectations, and thus cannot do this counterfactual reasoning. We propose to incorporate the learner's current understanding of the robot's decision making into our model of human IRL, so that a robot can select demonstrations that maximize the human's understanding. We also propose a novel measure for estimating the difficulty for a human to predict instances of a robot's behavior in unseen environments. A user study finds that our test difficulty measure correlates well with human performance and confidence. Interestingly, considering human beliefs and counterfactuals when selecting demonstrations decreases human performance on easy tests, but increases performance on difficult tests, providing insight on how to best utilize such models.


Adaptive Latent Factor Analysis via Generalized Momentum-Incorporated Particle Swarm Optimization

arXiv.org Artificial Intelligence

Stochastic gradient descent (SGD) algorithm is an effective learning strategy to build a latent factor analysis (LFA) model on a high-dimensional and incomplete (HDI) matrix. A particle swarm optimization (PSO) algorithm is commonly adopted to make an SGD-based LFA model's hyper-parameters, i.e, learning rate and regularization coefficient, self-adaptation. However, a standard PSO algorithm may suffer from accuracy loss caused by premature convergence. To address this issue, this paper incorporates more historical information into each particle's evolutionary process for avoiding premature convergence following the principle of a generalized-momentum (GM) method, thereby innovatively achieving a novel GM-incorporated PSO (GM-PSO). With it, a GM-PSO-based LFA (GMPL) model is further achieved to implement efficient self-adaptation of hyper-parameters. The experimental results on three HDI matrices demonstrate that the GMPL model achieves a higher prediction accuracy for missing data estimation in industrial applications.


Transferable Multi-Agent Reinforcement Learning with Dynamic Participating Agents

arXiv.org Artificial Intelligence

We study multi-agent reinforcement learning (MARL) with centralized training and decentralized execution. During the training, new agents may join, and existing agents may unexpectedly leave the training. In such situations, a standard deep MARL model must be trained again from scratch, which is very time-consuming. To tackle this problem, we propose a special network architecture with a few-shot learning algorithm that allows the number of agents to vary during centralized training. In particular, when a new agent joins the centralized training, our few-shot learning algorithm trains its policy network and value network using a small number of samples; when an agent leaves the training, the training process of the remaining agents is not affected. Our experiments show that using the proposed network architecture and algorithm, model adaptation when new agents join can be 100+ times faster than the baseline. Our work is applicable to any setting, including cooperative, competitive, and mixed.


Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

arXiv.org Artificial Intelligence

Computing Nash equilibrium policies is a central problem in multi-agent reinforcement learning that has received extensive attention both in theory and in practice. However, provable guarantees have been thus far either limited to fully competitive or cooperative scenarios or impose strong assumptions that are difficult to meet in most practical applications. In this work, we depart from those prior results by investigating infinite-horizon \emph{adversarial team Markov games}, a natural and well-motivated class of games in which a team of identically-interested players -- in the absence of any explicit coordination or communication -- is competing against an adversarial player. This setting allows for a unifying treatment of zero-sum Markov games and Markov potential games, and serves as a step to model more realistic strategic interactions that feature both competing and cooperative interests. Our main contribution is the first algorithm for computing stationary $\epsilon$-approximate Nash equilibria in adversarial team Markov games with computational complexity that is polynomial in all the natural parameters of the game, as well as $1/\epsilon$. The proposed algorithm is particularly natural and practical, and it is based on performing independent policy gradient steps for each player in the team, in tandem with best responses from the side of the adversary; in turn, the policy for the adversary is then obtained by solving a carefully constructed linear program. Our analysis leverages non-standard techniques to establish the KKT optimality conditions for a nonlinear program with nonconvex constraints, thereby leading to a natural interpretation of the induced Lagrange multipliers. Along the way, we significantly extend an important characterization of optimal policies in adversarial (normal-form) team games due to Von Stengel and Koller (GEB `97).


Finite-time Motion Planning of Multi-agent Systems with Collision Avoidance

arXiv.org Artificial Intelligence

Finite-time motion planning with collision avoidance is a challenging issue in multi-agent systems. This paper proposes a novel distributed controller based on a new Lyapunov barrier function which guarantees finite-time stability for multi-agent systems without collisions. First, the problem of finite-time motion planning of multi-agent systems is formulated. Then, a novel finite-time distributed controller is developed based on a Lyapunov barrier function. Finally, numerical simulations demonstrate the effectiveness of proposed method.


MMFN: Multi-Modal-Fusion-Net for End-to-End Driving

arXiv.org Artificial Intelligence

Inspired by the fact that humans use diverse sensory organs to perceive the world, sensors with different modalities are deployed in end-to-end driving to obtain the global context of the 3D scene. In previous works, camera and LiDAR inputs are fused through transformers for better driving performance. These inputs are normally further interpreted as high-level map information to assist navigation tasks. Nevertheless, extracting useful information from the complex map input is challenging, for redundant information may mislead the agent and negatively affect driving performance. We propose a novel approach to efficiently extract features from vectorized High-Definition (HD) maps and utilize them in the end-to-end driving tasks. In addition, we design a new expert to further enhance the model performance by considering multi-road rules. Experimental results prove that both of the proposed improvements enable our agent to achieve superior performance compared with other methods.


Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding

arXiv.org Artificial Intelligence

Modeling virtual agents with behavior style is one factor for personalizing human-agent interaction. In this paper, we propose an efficient yet effective machine learning approach to synthesize gestures driven by prosodic features and text in the style of different speakers including those unseen during training. Our model performs zero-shot multimodal style transfer driven by multimodal data from the PATS database containing videos of various speakers. We view style as being pervasive while speaking; it colors the communicative behaviors expressivity while speech content is carried by multimodal signals and text. This disentanglement scheme of content and style allows us to directly infer the style embedding even of speaker whose data are not part of the training phase, without requiring any further training or fine-tuning. The first goal of our model is to generate the gestures of a source speaker based on the content of two input modalities - Mel spectrogram and text semantics. The second goal is to condition the source speaker's predicted gestures on the multimodal behavior style embedding of a target speaker. The third goal is to allow zero-shot style transfer of speakers unseen during training without re-training the model. Our system consists of two main components: (1) a speaker style encoder network that learns to generate a fixed-dimensional speaker embedding style from a target speaker multimodal data (mel-spectrogram, pose, and text); and (2) a sequence-to-sequence synthesis network that synthesizes gestures based on the content of the input modalities - text and mel-spectrogram - of a source speaker, and conditioned on the speaker style embedding. We evaluate that our model is able to synthesize gestures of a source speaker given the two input modalities, and transfer the knowledge of target speaker style variability learned by the speaker style encoder to the gesture generation task in a zero-shot setup, indicating that the model has learned a high quality speaker representation. For our evaluation we convert the 2D generated gestures to 3D poses, and produce 3D animations of the generated gestures. We conduct objective and subjective evaluations to validate our approach and compare it with baselines. Keywords: audio and text driven gesture synthesis, zero-shot style transfer, embodied conversational agents 1 INTRODUCTION Human behavior style is a socially meaningful clustering of features found within and across multiple modalities, specifically in linguistic [7], spoken behavior such as the speaking style conveyed by speech prosody [29, 33], and nonverbal behavior such as hand gestures and body posture [32, 42].


A Game-Theoretic Approach for Hierarchical Epidemic Control

arXiv.org Artificial Intelligence

Democratic governments and institutions typically have a hierarchical structure. For example, policies in the U.S., Canada, and many European democracies emerge from complex interactions among the federal and state governments, as well as county boards, city councils and mayors. Such interactions are characterized by inherent asymmetries across different levels of the hierarchy. On the one hand, the specifics of policy formulation and enforcement (e.g., training and deployment of personnel and updating of infrastructure) are generally in the hands of administrative bodies at lower levels of the hierarchy -- often the lowest level -- for practical reasons; actions these entities take are the ones that truly matter in the sense that they directly impact costs and benefits realized at all levels. On the other hand, entities at higher levels may have the power to impose constraints in some form or another on the policy-makers within their immediate jurisdiction (e.g., the U.S. federal government can constrain state policies); violations of these constraints, in turn, entail a noncompliance cost to the violator, such as legal costs, penalties, or reputation loss. Examples of such hierarchical policy structure arise in the spheres of education (e.g., topics to be included in primary education), healthcare (e.g., vaccination) and immigration. A preeminent recent example of such hierarchical policy-making is the response to the ongoing COVID-19 pandemic in countries with decentralized administration. Policies concerning social distancing, masking and vaccination have involved recommendations at the federal level, guidelines and restrictions at the state/province/district level, and measures adopted by specific counties, cities or even individual businesses and schools. In general, policies are contentious.