Agents
Understanding reinforcement learned crowds
Kwiatkowski, Ariel, Kalogeiton, Vicky, Pettré, Julien, Cani, Marie-Paule
Simulating trajectories of virtual crowds is a commonly encountered task in Computer Graphics. Several recent works have applied Reinforcement Learning methods to animate virtual agents, however they often make different design choices when it comes to the fundamental simulation setup. Each of these choices comes with a reasonable justification for its use, so it is not obvious what is their real impact, and how they affect the results. In this work, we analyze some of these arbitrary choices in terms of their impact on the learning performance, as well as the quality of the resulting simulation measured in terms of the energy efficiency. We perform a theoretical analysis of the properties of the reward function design, and empirically evaluate the impact of using certain observation and action spaces on a variety of scenarios, with the reward function and energy usage as metrics. We show that directly using the neighboring agents' information as observation generally outperforms the more widely used raycasting. Similarly, using nonholonomic controls with egocentric observations tends to produce more efficient behaviors than holonomic controls with absolute observations. Each of these choices has a significant, and potentially nontrivial impact on the results, and so researchers should be mindful about choosing and reporting them in their work.
Measuring Interventional Robustness in Reinforcement Learning
Avery, Katherine, Kenney, Jack, Amaranath, Pracheta, Cai, Erica, Jensen, David
Recent work in reinforcement learning has focused on several characteristics of learned policies that go beyond maximizing reward. These properties include fairness, explainability, generalization, and robustness. In this paper, we define interventional robustness (IR), a measure of how much variability is introduced into learned policies by incidental aspects of the training procedure, such as the order of training data or the particular exploratory actions taken by agents. A training procedure has high IR when the agents it produces take very similar actions under intervention, despite variation in these incidental aspects of the training procedure. We develop an intuitive, quantitative measure of IR and calculate it for eight algorithms in three Atari environments across dozens of interventions and states. From these experiments, we find that IR varies with the amount of training and type of algorithm and that high performance does not imply high IR, as one might expect.
Autonomous Task Planning for Heterogeneous Multi-Agent Systems
Tziola, Anatoli A., Loizou, Savvas G.
This paper presents a solution to the automatic task planning problem for multi-agent systems. A formal framework is developed based on the Nondeterministic Finite Automata with $\epsilon$-transitions, where given the capabilities, constraints and failure modes of the agents involved, an initial state of the system and a task specification, an optimal solution is generated that satisfies the system constraints and the task specification. The resulting solution is guaranteed to be complete and optimal; moreover a heuristic solution that offers significant reduction of the computational requirements while relaxing the completeness and optimality requirements is proposed. The constructed system model is independent from the initial condition and the task specification, alleviating the need to repeat the costly pre-processing cycle for solving other scenarios, while allowing the incorporation of failure modes on-the-fly. Two case studies are provided: a simple one to showcase the concepts of the proposed methodology and a more elaborate one to demonstrate the effectiveness and validity of the methodology.
Diffusion of Information on Networked Lattices by Gossip
-- We study time-dependent dynamics on a network of order lattices, where structure-preserving lattice maps are used to fuse lattice-valued data over vertices and edges. The principal contribution is a novel asynchronous Laplacian, generalizing the usual graph Laplacian, adapted to a network of heterogeneous lattices. The resulting gossip algorithm is shown to converge asymptotically to stable "harmonic" distributions of lattice data. This general theorem is applicable to several general problems, including lattice-valued consensus, Kripke semantics, and threat detection, all using asynchronous local update rules. The use of the graph Laplacian to diffuse information over networks is well-established in classical and contemporary work ranging from opinion dynamics [1] to distributed multi-agent consensus [2] and control [3], synchronization [4], [5], flocking [6], and much more. In the past decade, Laplacians that are adapted to handle vector-valued data, such as graph connection Laplacians [7], [8] or matrix-weighted Laplacians [9], have been revolutionary in signal processing processing [10], [11] and machine learning [12], [13]. While the ultimate form of a generalized Laplacian is as yet not present in applications, there are hints of a broader theory finding its way from algebraic topology to data science. The Laplacian from calculus class and the graph Laplacian are two extreme examples of a Hodge Laplacian .
AdvDO: Realistic Adversarial Attacks for Trajectory Prediction
Cao, Yulong, Xiao, Chaowei, Anandkumar, Anima, Xu, Danfei, Pavone, Marco
Trajectory prediction is essential for autonomous vehicles (AVs) to plan correct and safe driving behaviors. While many prior works aim to achieve higher prediction accuracy, few study the adversarial robustness of their methods. To bridge this gap, we propose to study the adversarial robustness of data-driven trajectory prediction systems. We devise an optimization-based adversarial attack framework that leverages a carefully-designed differentiable dynamic model to generate realistic adversarial trajectories. Empirically, we benchmark the adversarial robustness of state-of-the-art prediction models and show that our attack increases the prediction error for both general metrics and planning-aware metrics by more than 50% and 37%. We also show that our attack can lead an AV to drive off road or collide into other vehicles in simulation. Finally, we demonstrate how to mitigate the adversarial attacks using an adversarial training scheme.
Safety-driven Interactive Planning for Neural Network-based Lane Changing
Liu, Xiangguo, Jiao, Ruochen, Zheng, Bowen, Liang, Dave, Zhu, Qi
Neural network-based driving planners have shown great promises in improving task performance of autonomous driving. However, it is critical and yet very challenging to ensure the safety of systems with neural network based components, especially in dense and highly interactive traffic environments. In this work, we propose a safety-driven interactive planning framework for neural network-based lane changing. To prevent over conservative planning, we identify the driving behavior of surrounding vehicles and assess their aggressiveness, and then adapt the planned trajectory for the ego vehicle accordingly in an interactive manner. The ego vehicle can proceed to change lanes if a safe evasion trajectory exists even in the predicted worst case; otherwise, it can stay around the current lateral position or return back to the original lane. We quantitatively demonstrate the effectiveness of our planner design and its advantage over baseline methods through extensive simulations with diverse and comprehensive experimental settings, as well as in real-world scenarios collected by an autonomous vehicle company.
Distributed Semi-supervised Fuzzy Regression with Interpolation Consistency Regularization
Shi, Ye, Zhang, Leijie, Cao, Zehong, Tanveer, M., Lin, Chin-Teng
Recently, distributed semi-supervised learning (DSSL) algorithms have shown their effectiveness in leveraging unlabeled samples over interconnected networks, where agents cannot share their original data with each other and can only communicate non-sensitive information with their neighbors. However, existing DSSL algorithms cannot cope with data uncertainties and may suffer from high computation and communication overhead problems. To handle these issues, we propose a distributed semi-supervised fuzzy regression (DSFR) model with fuzzy if-then rules and interpolation consistency regularization (ICR). The ICR, which was proposed recently for semi-supervised problem, can force decision boundaries to pass through sparse data areas, thus increasing model robustness. However, its application in distributed scenarios has not been considered yet. In this work, we proposed a distributed Fuzzy C-means (DFCM) method and a distributed interpolation consistency regularization (DICR) built on the well-known alternating direction method of multipliers to respectively locate parameters in antecedent and consequent components of DSFR. Notably, the DSFR model converges very fast since it does not involve back-propagation procedure and is scalable to large-scale datasets benefiting from the utilization of DFCM and DICR. Experiments results on both artificial and real-world datasets show that the proposed DSFR model can achieve much better performance than the state-of-the-art DSSL algorithm in terms of both loss value and computational cost.
Too Global To Be Local: Swarm Consensus in Adversarial Settings
Reaching a consensus in a swarm of robots is one of the fundamental problems in swarm robotics, examining the possibility of reaching an agreement within the swarm members. The recently-introduced contamination problem offers a new perspective of the problem, in which swarm members should reach a consensus in spite of the existence of adversarial members that intentionally act to divert the swarm members towards a different consensus. In this paper, we search for a consensus-reaching algorithm under the contamination problem setting by taking a top-down approach: We transform the problem to a centralized two-player game in which each player controls the behavior of a subset of the swarm, trying to force the entire swarm to converge to an agreement on its own value. We define a performance metric for each players performance, proving a correlation between this metric and the chances of the player to win the game. We then present the globally optimal solution to the game and prove that unfortunately it is unattainable in a distributed setting, due to the challenging characteristics of the swarm members. We therefore examine the problem on a simplified swarm model, and compare the performance of the globally optimal strategy with locally optimal strategies, demonstrating its superiority in rigorous simulation experiments.
VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning
Bettini, Matteo, Kortvelesy, Ryan, Blumenkamp, Jan, Prorok, Amanda
While many multi-robot coordination problems can be solved optimally by exact algorithms, solutions are often not scalable in the number of robots. Multi-Agent Reinforcement Learning (MARL) is gaining increasing attention in the robotics community as a promising solution to tackle such problems. Nevertheless, we still lack the tools that allow us to quickly and efficiently find solutions to large-scale collective learning tasks. In this work, we introduce the Vectorized Multi-Agent Simulator (VMAS). VMAS is an open-source framework designed for efficient MARL benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of twelve challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface. We demonstrate how vectorization enables parallel simulation on accelerated hardware without added complexity. When comparing VMAS to OpenAI MPE, we show how MPE's execution time increases linearly in the number of simulations while VMAS is able to execute 30,000 parallel simulations in under 10s, proving more than 100x faster. Using VMAS's RLlib interface, we benchmark our multi-robot scenarios using various Proximal Policy Optimization (PPO)-based MARL algorithms. VMAS's scenarios prove challenging in orthogonal ways for state-of-the-art MARL algorithms. The VMAS framework is available at https://github.com/proroklab/VectorizedMultiAgentSimulator. A video of VMAS scenarios and experiments is available at https://youtu.be/aaDRYfiesAY.
Edge Computing Architectures for Enabling the Realisation of the Next Generation Robotic Systems
Seisa, Achilleas Santi, Damigos, Gerasimos, Satpute, Sumeet Gajanan, Koval, Anton, Nikolakopoulos, George
Edge Computing is a promising technology to provide new capabilities in technological fields that require instantaneous data processing. Researchers in areas such as machine and deep learning use extensively edge and cloud computing for their applications, mainly due to the significant computational and storage resources that they provide. Currently, Robotics is seeking to take advantage of these capabilities as well, and with the development of 5G networks, some existing limitations in the field can be overcome. In this context, it is important to know how to utilize the emerging edge architectures, what types of edge architectures and platforms exist today and which of them can and should be used based on each robotic application. In general, Edge platforms can be implemented and used differently, especially since there are several providers offering more or less the same set of services with some essential differences. Thus, this study addresses these discussions for those who work in the development of the next generation robotic systems and will help to understand the advantages and disadvantages of each edge computing architecture in order to choose wisely the right one for each application.