Goto

Collaborating Authors

 Agents


VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow Prediction

arXiv.org Artificial Intelligence

Predicting future behaviors of road agents is a key task in autonomous driving. While existing models have demonstrated great success in predicting marginal agent future behaviors, it remains a challenge to efficiently predict consistent joint behaviors of multiple agents. Recently, the occupancy flow fields representation was proposed to represent joint future states of road agents through a combination of occupancy grid and flow, which supports efficient and consistent joint predictions. In this work, we propose a novel occupancy flow fields predictor to produce accurate occupancy and flow predictions, by combining the power of an image encoder that learns features from a rasterized traffic image and a vector encoder that captures information of continuous agent trajectories and map states. The two encoded features are fused by multiple attention modules before generating final predictions. Our simple but effective model ranks 3rd place on the Waymo Open Dataset Occupancy and Flow Prediction Challenge, and achieves the best performance in the occluded occupancy and flow prediction task.


Maximum Correntropy Value Decomposition for Multi-agent Deep Reinforcemen Learning

arXiv.org Artificial Intelligence

We explore value decomposition solutions for multi-agent deep reinforcement learning in the popular paradigm of centralized training with decentralized execution(CTDE). As the recognized best solution to CTDE, Weighted QMIX is cutting-edge on StarCraft Multi-agent Challenge (SMAC), with a weighting scheme implemented on QMIX to place more emphasis on the optimal joint actions. However, the fixed weight requires manual tuning according to the application scenarios, which painfully prevents Weighted QMIX from being used in broader engineering applications. In this paper, we first demonstrate the flaw of Weighted QMIX using an ordinary One-Step Matrix Game (OMG), that no matter how the weight is chosen, Weighted QMIX struggles to deal with non-monotonic value decomposition problems with a large variance of reward distributions. Then we characterize the problem of value decomposition as an Underfitting One-edged Robust Regression problem and make the first attempt to give a solution to the value decomposition problem from the perspective of information-theoretical learning. We introduce the Maximum Correntropy Criterion (MCC) as a cost function to dynamically adapt the weight to eliminate the effects of minimum in reward distributions. We simplify the implementation and propose a new algorithm called MCVD. A preliminary experiment conducted on OMG shows that MCVD could deal with non-monotonic value decomposition problems with a large tolerance of kernel bandwidth selection. Further experiments are carried out on Cooperative-Navigation and multiple SMAC scenarios, where MCVD exhibits unprecedented ease of implementation, broad applicability, and stability.


Quantization enabled Privacy Protection in Decentralized Stochastic Optimization

arXiv.org Artificial Intelligence

By enabling multiple agents to cooperatively solve a global optimization problem in the absence of a central coordinator, decentralized stochastic optimization is gaining increasing attention in areas as diverse as machine learning, control, and sensor networks. Since the associated data usually contain sensitive information, such as user locations and personal identities, privacy protection has emerged as a crucial need in the implementation of decentralized stochastic optimization. In this paper, we propose a decentralized stochastic optimization algorithm that is able to guarantee provable convergence accuracy even in the presence of aggressive quantization errors that are proportional to the amplitude of quantization inputs. The result applies to both convex and non-convex objective functions, and enables us to exploit aggressive quantization schemes to obfuscate shared information, and hence enables privacy protection without losing provable optimization accuracy. In fact, by using a {stochastic} ternary quantization scheme, which quantizes any value to three numerical levels, we achieve quantization-based rigorous differential privacy in decentralized stochastic optimization, which has not been reported before. In combination with the presented quantization scheme, the proposed algorithm ensures, for the first time, rigorous differential privacy in decentralized stochastic optimization without losing provable convergence accuracy. Simulation results for a distributed estimation problem as well as numerical experiments for decentralized learning on a benchmark machine learning dataset confirm the effectiveness of the proposed approach.


Post-Disaster Repair Crew Assignment Optimization Using Minimum Latency

arXiv.org Artificial Intelligence

Across infrastructure domains, physical damage caused by storms and other weather events often requires costly and time-sensitive repairs to restore services as quickly as possible. While recent studies have used agent-based models to estimate the cost of repairs, the implemented strategies for assignment of repair crews to different locations are generally human-driven or based on simple rules. In order to find performant strategies, we continue with an agent-based model, but approach this problem as a combinational optimization problem known as the Minimum Weighted Latency Problem for multiple repair crews. We apply a partitioning algorithm that balances the assignment of targets amongst all the crews using two different heuristics that optimize either the importance of repair locations or the travel time between them. We benchmark our algorithm on both randomly generated graphs as well as data derived from a real-world urban environment, and show that our algorithm delivers significantly better assignments than existing methods.


Socially Intelligent Genetic Agents for the Emergence of Explicit Norms

arXiv.org Artificial Intelligence

Norms help regulate a society. Norms may be explicit (represented in structured form) or implicit. We address the emergence of explicit norms by developing agents who provide and reason about explanations for norm violations in deciding sanctions and identifying alternative norms. These agents use a genetic algorithm to produce norms and reinforcement learning to learn the values of these norms. We find that applying explanations leads to norms that provide better cohesion and goal satisfaction for the agents. Our results are stable for societies with differing attitudes of generosity.


Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Many advances in cooperative multi-agent reinforcement learning (MARL) are based on two common design principles: value decomposition and parameter sharing. A typical MARL algorithm of this fashion decomposes a centralized Q-function into local Q-networks with parameters shared across agents. Such an algorithmic paradigm enables centralized training and decentralized execution (CTDE) and leads to efficient learning in practice. Despite all the advantages, we revisit these two principles and show that in certain scenarios, e.g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes. In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases, which partially supports some recent empirical observations that PG can be effective in many MARL testbeds. Inspired by our theoretical analysis, we present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors and empirically validate our findings on a variety of domains, ranging from the simplified matrix and grid-world games to complex benchmarks such as StarCraft Multi-Agent Challenge and Google Research Football. We hope our insights could benefit the community towards developing more general and more powerful MARL algorithms. Check our project website at https://sites.google.com/view/revisiting-marl.


Learning Connectivity-Maximizing Network Configurations

arXiv.org Artificial Intelligence

In this letter we propose a data-driven approach to optimizing the algebraic connectivity of a team of robots. While a considerable amount of research has been devoted to this problem, we lack a method that scales in a manner suitable for online applications for more than a handful of agents. To that end, we propose a supervised learning approach with a convolutional neural network (CNN) that learns to place communication agents from an expert that uses an optimization-based strategy. We demonstrate the performance of our CNN on canonical line and ring topologies, 105k randomly generated test cases, and larger teams not seen during training. We also show how our system can be applied to dynamic robot teams through a Unity-based simulation. After training, our system produces connected configurations over an order of magnitude faster than the optimization-based scheme for teams of 10-20 agents.


Cooperative Reinforcement Learning on Traffic Signal Control

arXiv.org Artificial Intelligence

Traffic signal control is a challenging real-world problem aiming to minimize overall travel time by coordinating vehicle movements at road intersections. Existing traffic signal control systems in use still rely heavily on oversimplified information and rule-based methods. Specifically, the periodicity of green/red light alternations can be considered as a prior for better planning of each agent in policy optimization. To better learn such adaptive and predictive priors, traditional RL-based methods can only return a fixed length from predefined action pool with only local agents. If there is no cooperation between these agents, some agents often make conflicts to other agents and thus decrease the whole throughput. This paper proposes a cooperative, multi-objective architecture with age-decaying weights to better estimate multiple reward terms for traffic signal control optimization, which termed COoperative Multi-Objective Multi-Agent Deep Deterministic Policy Gradient (COMMA-DDPG). Two types of agents running to maximize rewards of different goals - one for local traffic optimization at each intersection and the other for global traffic waiting time optimization. The global agent is used to guide the local agents as a means for aiding faster learning but not used in the inference phase. We also provide an analysis of solution existence together with convergence proof for the proposed RL optimization. Evaluation is performed using real-world traffic data collected using traffic cameras from an Asian country. Our method can effectively reduce the total delayed time by 60\%. Results demonstrate its superiority when compared to SoTA methods.


Developing Optimal Causal Cyber-Defence Agents via Cyber Security Simulation

arXiv.org Artificial Intelligence

In this paper we explore cyber security defence, through the unification of a novel cyber security simulator with models for (causal) decision-making through optimisation. Particular attention is paid to a recently published approach: dynamic causal Bayesian optimisation (DCBO). We propose that DCBO can act as a blue agent when provided with a view of a simulated network and a causal model of how a red agent spreads within that network. To investigate how DCBO can perform optimal interventions on host nodes, in order to reduce the cost of intrusions caused by the red agent. Through this we demonstrate a complete cyber-simulation system, which we use to generate observational data for DCBO and provide numerical quantitative results which lay the foundations for future work in this space.


A Cooperation Graph Approach for Multiagent Sparse Reward Reinforcement Learning

arXiv.org Artificial Intelligence

Multiagent reinforcement learning (MARL) can solve complex cooperative tasks. However, the efficiency of existing MARL methods relies heavily on well-defined reward functions. Multiagent tasks with sparse reward feedback are especially challenging not only because of the credit distribution problem, but also due to the low probability of obtaining positive reward feedback. In this paper, we design a graph network called Cooperation Graph (CG). The Cooperation Graph is the combination of two simple bipartite graphs, namely, the Agent Clustering subgraph (ACG) and the Cluster Designating subgraph (CDG). Next, based on this novel graph structure, we propose a Cooperation Graph Multiagent Reinforcement Learning (CG-MARL) algorithm, which can efficiently deal with the sparse reward problem in multiagent tasks. In CG-MARL, agents are directly controlled by the Cooperation Graph. And a policy neural network is trained to manipulate this Cooperation Graph, guiding agents to achieve cooperation in an implicit way. This hierarchical feature of CG-MARL provides space for customized cluster-actions, an extensible interface for introducing fundamental cooperation knowledge. In experiments, CG-MARL shows state-of-the-art performance in sparse reward multiagent benchmarks, including the anti-invasion interception task and the multi-cargo delivery task.