Goto

Collaborating Authors

 Agents


Fast-Convergent Dynamics for Distributed Allocation of Resources Over Switching Sparse Networks with Quantized Communication Links

arXiv.org Artificial Intelligence

This paper proposes networked dynamics to solve resource allocation problems over time-varying multi-agent networks. The state of each agent represents the amount of used resources (or produced utilities) while the total amount of resources is fixed. The idea is to optimally allocate the resources among the group of agents by minimizing the overall cost function subject to fixed sum of resources. Each agents' information is restricted to its own state and cost function and those of its immediate in-neighbors. This is motivated by distributed applications such as mobile edge-computing, economic dispatch over smart grids, and multi-agent coverage control. This work provides a fast convergent solution (in comparison with linear dynamics) while considering relaxed network connectivity with quantized communication links. The proposed dynamics reaches optimal solution over switching (possibly disconnected) undirected networks as far as their union over some bounded non-overlapping time-intervals has a spanning-tree. We prove feasibility of the solution, uniqueness of the optimal state, and convergence to the optimal value under the proposed dynamics, where the analysis is applicable to similar 1st-order allocation dynamics with strongly sign-preserving nonlinearities, such as actuator saturation.


Active Audio-Visual Separation of Dynamic Sound Sources

arXiv.org Artificial Intelligence

We explore active audio-visual separation for dynamic sound sources, where an embodied agent moves intelligently in a 3D environment to continuously isolate the time-varying audio stream being emitted by an object of interest. The agent hears a mixed stream of multiple audio sources (e.g., multiple people conversing and a band playing music at a noisy party). Given a limited time budget, it needs to extract the target sound accurately at every step using egocentric audio-visual observations. We propose a reinforcement learning agent equipped with a novel transformer memory that learns motion policies to control its camera and microphone to recover the dynamic target audio, using self-attention to make high-quality estimates for current timesteps and also simultaneously improve its past estimates. Using highly realistic acoustic SoundSpaces [14] simulations in real-world scanned Matterport3D [12] environments, we show that our model is able to learn efficient behavior to carry out continuous separation of a dynamic audio target.


Latency-Aware Collaborative Perception

arXiv.org Artificial Intelligence

Collaborative perception has recently shown great potential to improve perception capabilities over single-agent perception. Existing collaborative perception methods usually consider an ideal communication environment. However, in practice, the communication system inevitably suffers from latency issues, causing potential performance degradation and high risks in safety-critical applications, such as autonomous driving. To mitigate the effect caused by the inevitable latency, from a machine learning perspective, we present the first latency-aware collaborative perception system, which actively adapts asynchronous perceptual features from multiple agents to the same time stamp, promoting the robustness and effectiveness of collaboration. To achieve such a feature-level synchronization, we propose a novel latency compensation module, called SyncNet, which leverages feature-attention symbiotic estimation and time modulation techniques. Experiments results show that the proposed latency aware collaborative perception system with SyncNet can outperforms the state-of-the-art collaborative perception method by 15.6% in the communication latency scenario and keep collaborative perception being superior to single agent perception under severe latency.


Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

arXiv.org Artificial Intelligence

While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight $\widetilde{\mathcal{O}}(\sqrt{K})$ regret bounds after $K$ episodes in a two-agent competitive game scenario. The regret of each agent is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is $\widetilde{\mathcal{O}}(\sqrt{K})$.


Learning effective stochastic differential equations from microscopic simulations: linking stochastic numerics to deep learning

arXiv.org Artificial Intelligence

We identify effective stochastic differential equations (SDE) for coarse observables of fine-grained particle- or agent-based simulations; these SDE then provide useful coarse surrogate models of the fine scale dynamics. We approximate the drift and diffusivity functions in these effective SDE through neural networks, which can be thought of as effective stochastic ResNets. The loss function is inspired by, and embodies, the structure of established stochastic numerical integrators (here, Euler-Maruyama and Milstein); our approximations can thus benefit from backward error analysis of these underlying numerical schemes. They also lend themselves naturally to "physics-informed" gray-box identification when approximate coarse models, such as mean field equations, are available. Existing numerical integration schemes for Langevin-type equations and for stochastic partial differential equations (SPDE) can also be used for training; we demonstrate this on a stochastically forced oscillator and the stochastic wave equation. Our approach does not require long trajectories, works on scattered snapshot data, and is designed to naturally handle different time steps per snapshot. We consider both the case where the coarse collective observables are known in advance, as well as the case where they must be found in a data-driven manner.


Distributed Projection-free Algorithm for Constrained Aggregative Optimization

arXiv.org Artificial Intelligence

In this paper, we focus on solving a distributed convex aggregative optimization problem in a network, where each agent has its own cost function which depends not only on its own decision variables but also on the aggregated function of all agents' decision variables. The decision variable is constrained within a feasible set. In order to minimize the sum of the cost functions when each agent only knows its local cost function, we propose a distributed Frank-Wolfe algorithm based on gradient tracking for the aggregative optimization problem where each node maintains two estimates, namely an estimate of the sum of agents' decision variable and an estimate of the gradient of global function. The algorithm is projection-free, but only involves solving a linear optimization to get a search direction at each step. We show the convergence of the proposed algorithm for convex and smooth objective functions over a time-varying network. Finally, we demonstrate the convergence and computational efficiency of the proposed algorithm via numerical simulations.


HARL: A Novel Hierachical Adversary Reinforcement Learning for Automoumous Intersection Management

arXiv.org Artificial Intelligence

As an emerging technology, Connected Autonomous Vehicles (CAVs) are believed to have the ability to move through intersections in a faster and safer manner, through effective Vehicle-to-Everything (V2X) communication and global observation. Autonomous intersection management is a key path to efficient crossing at intersections, which reduces unnecessary slowdowns and stops through adaptive decision process of each CAV, enabling fuller utilization of the intersection space. Distributed reinforcement learning (DRL) offers a flexible, end-to-end model for AIM, adapting for many intersection scenarios. While DRL is prone to collisions as the actions of multiple sides in the complicated interactions are sampled from a generic policy, restricting the application of DRL in realistic scenario. To address this, we propose a hierarchical RL framework where models at different levels vary in receptive scope, action step length, and feedback period of reward. The upper layer model accelerate CAVs to prevent them from being clashed, while the lower layer model adjust the trends from upper layer model to avoid the change of mobile state causing new conflicts. And the real action of CAV at each step is co-determined by the trends from both levels, forming a real-time balance in the adversarial process. The proposed model is proven effective in the experiment undertaken in a complicated intersection with 4 branches and 4 lanes each branch, and show better performance compared with baselines.


Initial Orbit Determination for the CR3BP using Particle Swarm Optimization

arXiv.org Artificial Intelligence

This work utilizes a particle swarm optimizer (PSO) for initial orbit determination for a chief and deputy scenario in the circular restricted three-body problem (CR3BP). The PSO is used to minimize the difference between actual and estimated observations and knowledge of the chief's position with known CR3BP dynamics to determine the deputy's initial state. Convergence is achieved through limiting particle starting positions to feasible positions based on the known chief position, and sensor constraints. Parallel and GPU processing methods are used to improve computation time and provide an accurate initial state estimate for a variety of cislunar orbit geometries.


JSwarm: A Jingulu-Inspired Human-AI-Teaming Language for Context-Aware Swarm Guidance

#artificialintelligence

Bi-directional communication between humans and swarm systems begs for efficient languages to communicate information between the humans and the Artificial Intelligence (AI)-enabled agents in a manner that is most appropriate for the context. We discuss the criteria for effective teaming and functional bi-directional communication between humans and AI, and the design choices required to create effective languages. We then present a human-AI-teaming communication language inspired by the Australian Aboriginal language of Jingulu, which we call JSwarm. We present the motivation and structure of the language. An example is used to demonstrate how the language operates for a shepherding swarm guidance task.


Policy Optimization for Markov Games: Unified Framework and Faster Convergence

arXiv.org Artificial Intelligence

This paper studies policy optimization algorithms for multi-agent reinforcement learning. We begin by proposing an algorithm framework for two-player zero-sum Markov Games in the full-information setting, where each iteration consists of a policy update step at each state using a certain matrix game algorithm, and a value update step with a certain learning rate. This framework unifies many existing and new policy optimization algorithms. We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game, as long as the matrix game algorithms achieve low weighted regret at each state, with respect to weights determined by the speed of the value updates. Next, we show that this framework instantiated with the Optimistic Follow-The-Regularized-Leader (OFTRL) algorithm at each state (and smooth value updates) can find an $\mathcal{\widetilde{O}}(T^{-5/6})$ approximate NE in $T$ iterations, and a similar algorithm with slightly modified value update rule achieves a faster $\mathcal{\widetilde{O}}(T^{-1})$ convergence rate. These improve over the current best $\mathcal{\widetilde{O}}(T^{-1/2})$ rate of symmetric policy optimization type algorithms. We also extend this algorithm to multi-player general-sum Markov Games and show an $\mathcal{\widetilde{O}}(T^{-3/4})$ convergence rate to Coarse Correlated Equilibria (CCE). Finally, we provide a numerical example to verify our theory and investigate the importance of smooth value updates, and find that using "eager" value updates instead (equivalent to the independent natural policy gradient algorithm) may significantly slow down the convergence, even on a simple game with $H=2$ layers.