Goto

Collaborating Authors

 Cai, Xiaoyi


GRAM: Generalization in Deep RL with a Robust Adaptation Module

arXiv.org Machine Learning

The reliable deployment of deep reinforcement learning in real-world settings requires the ability to generalize across a variety of conditions, including both in-distribution scenarios seen during training as well as novel out-of-distribution scenarios. In this work, we present a framework for dynamics generalization in deep reinforcement learning that unifies these two distinct types of generalization within a single architecture. We introduce a robust adaptation module that provides a mechanism for identifying and reacting to both in-distribution and outof-distribution environment dynamics, along with a joint training pipeline that combines the goals of in-distribution adaptation and out-of-distribution robustness. Our algorithm GRAM achieves strong generalization performance across indistribution and out-of-distribution scenarios upon deployment, which we demonstrate on a variety of realistic simulated locomotion tasks with a quadruped robot. Due to the diverse and uncertain nature of real-world settings, generalization is an important capability for the reliable deployment of data-driven, learning-based frameworks such as deep reinforcement learning (RL). Policies trained with deep RL must be capable of generalizing to a variety of different environment dynamics at deployment time, including both familiar training conditions and novel unseen scenarios, as the complex nature of real-world environments makes it difficult to capture all possible variations in the training process. Existing approaches to zero-shot dynamics generalization in deep RL have focused on two complementary concepts: adaptation and robustness. Contextual RL techniques (Hallak et al., 2015) learn to identify and adapt to the current environment dynamics to achieve the best performance, but this adaptation is only reliable for the range of in-distribution (ID) scenarios seen during training. Robust RL methods (Nilim & Ghaoui, 2005; Iyengar, 2005), on the other hand, maximize the worst-case performance across a range of possible environment dynamics, providing generalization to out-of-distribution (OOD) scenarios at the cost of conservative performance in ID environments.


CGD: Constraint-Guided Diffusion Policies for UAV Trajectory Planning

arXiv.org Artificial Intelligence

Traditional optimization-based planners, while effective, suffer from high computational costs, resulting in slow trajectory generation. A successful strategy to reduce computation time involves using Imitation Learning (IL) to develop fast neural network (NN) policies from those planners, which are treated as expert demonstrators. Although the resulting NN policies are effective at quickly generating trajectories similar to those from the expert, (1) their output does not explicitly account for dynamic feasibility, and (2) the policies do not accommodate changes in the constraints different from those used during training. To overcome these limitations, we propose Constraint-Guided Diffusion (CGD), a novel IL-based approach to trajectory planning. CGD leverages a hybrid learning/online optimization scheme that combines diffusion policies with a surrogate efficient optimization problem, enabling the generation of collision-free, dynamically feasible trajectories. The key ideas of CGD include dividing the original challenging optimization problem solved by the expert into two more manageable sub-problems: (a) efficiently finding collision-free paths, and (b) determining a dynamically-feasible time-parametrization for those paths to obtain a trajectory. Compared to conventional neural network architectures, we demonstrate through numerical evaluations significant improvements in performance and dynamic feasibility under scenarios with new constraints never encountered during training.


EVORA: Deep Evidential Traversability Learning for Risk-Aware Off-Road Autonomy

arXiv.org Artificial Intelligence

Traversing terrain with good traction is crucial for achieving fast off-road navigation. Instead of manually designing costs based on terrain features, existing methods learn terrain properties directly from data via self-supervision, but challenges remain to properly quantify and mitigate risks due to uncertainties in learned models. This work efficiently quantifies both aleatoric and epistemic uncertainties by learning discrete traction distributions and probability densities of the traction predictor's latent features. Leveraging evidential deep learning, we parameterize Dirichlet distributions with the network outputs and propose a novel uncertainty-aware squared Earth Mover's distance loss with a closed-form expression that improves learning accuracy and navigation performance. The proposed risk-aware planner simulates state trajectories with the worst-case expected traction to handle aleatoric uncertainty, and penalizes trajectories moving through terrain with high epistemic uncertainty. Our approach is extensively validated in simulation and on wheeled and quadruped robots, showing improved navigation performance compared to methods that assume no slip, assume the expected traction, or optimize for the worst-case expected cost.


Probabilistic Traversability Model for Risk-Aware Motion Planning in Off-Road Environments

arXiv.org Artificial Intelligence

A key challenge in off-road navigation is that even visually similar terrains or ones from the same semantic class may have substantially different traction properties. Existing work typically assumes no wheel slip or uses the expected traction for motion planning, where the predicted trajectories provide a poor indication of the actual performance if the terrain traction has high uncertainty. In contrast, this work proposes to analyze terrain traversability with the empirical distribution of traction parameters in unicycle dynamics, which can be learned by a neural network in a self-supervised fashion. The probabilistic traction model leads to two risk-aware cost formulations that account for the worst-case expected cost and traction. To help the learned model generalize to unseen environment, terrains with features that lead to unreliable predictions are detected via a density estimator fit to the trained network's latent space and avoided via auxiliary penalties during planning. Simulation results demonstrate that the proposed approach outperforms existing work that assumes no slip or uses the expected traction in both navigation success rate and completion time. Furthermore, avoiding terrains with low density-based confidence score achieves up to 30% improvement in success rate when the learned traction model is used in a novel environment.


RAMP: A Risk-Aware Mapping and Planning Pipeline for Fast Off-Road Ground Robot Navigation

arXiv.org Artificial Intelligence

A key challenge in fast ground robot navigation in 3D terrain is balancing robot speed and safety. Recent work has shown that 2.5D maps (2D representations with additional 3D information) are ideal for real-time safe and fast planning. However, the prevalent approach of generating 2D occupancy grids through raytracing makes the generated map unsafe to plan in, due to inaccurate representation of unknown space. Additionally, existing planners such as MPPI do not consider speeds in known free and unknown space separately, leading to slower overall plans. The RAMP pipeline proposed here solves these issues using new mapping and planning methods. This work first presents ground point inflation with persistent spatial memory as a way to generate accurate occupancy grid maps from classified pointclouds. Then we present an MPPI-based planner with embedded variability in horizon, to maximize speed in known free space while retaining cautionary penetration into unknown space. Finally, we integrate this mapping and planning pipeline with risk constraints arising from 3D terrain, and verify that it enables fast and safe navigation using simulations and hardware demonstrations.


Energy-Aware, Collision-Free Information Gathering for Heterogeneous Robot Teams

arXiv.org Artificial Intelligence

This paper considers the problem of safely coordinating a team of sensor-equipped robots to reduce uncertainty about a dynamical process, where the objective trades off information gain and energy cost. Optimizing this trade-off is desirable, but leads to a non-monotone objective function in the set of robot trajectories. Therefore, common multi-robot planners based on coordinate descent lose their performance guarantees. Furthermore, methods that handle non-monotonicity lose their performance guarantees when subject to inter-robot collision avoidance constraints. As it is desirable to retain both the performance guarantee and safety guarantee, this work proposes a hierarchical approach with a distributed planner that uses local search with a worst-case performance guarantees and a decentralized controller based on control barrier functions that ensures safety and encourages timely arrival at sensing locations. Via extensive simulations, hardware-in-the-loop tests and hardware experiments, we demonstrate that the proposed approach achieves a better trade-off between sensing and energy cost than coordinate-descent-based algorithms.