reach-avoid problem
Solving Minimum-Cost Reach Avoid using Reinforcement Learning
Current reinforcement-learning methods are unable to directly learn policies that solve the minimum cost reach-avoid problem to minimize cumulative costs subject to the constraints of reaching the goal and avoiding unsafe states, as the structure of this new optimization problem is incompatible with current methods. Instead, a surrogate problem is solved where all objectives are combined with a weighted sum. However, this surrogate objective results in suboptimal policies that do not directly minimize the cumulative cost. In this work, we propose RC-PPO, a reinforcement-learning-based method for solving the minimum-cost reach-avoid problem by using connections to Hamilton-Jacobi reachability. Empirical results demonstrate that RC-PPO learns policies with comparable goal-reaching rates to while achieving up to 57% lower cumulative costs compared to existing methods on a suite of minimum-cost reach-avoid benchmarks on the Mujoco simulator. The project page can be found at https://oswinso.xyz/rcppo.
Solving Minimum-Cost Reach Avoid using Reinforcement Learning
Current reinforcement-learning methods are unable to directly learn policies that solve the minimum cost reach-avoid problem to minimize cumulative costs subject to the constraints of reaching the goal and avoiding unsafe states, as the structure of this new optimization problem is incompatible with current methods. Instead, a surrogate problem is solved where all objectives are combined with a weighted sum. However, this surrogate objective results in suboptimal policies that do not directly minimize the cumulative cost. In this work, we propose RC-PPO, a reinforcement-learning-based method for solving the minimum-cost reach-avoid problem by using connections to Hamilton-Jacobi reachability. Empirical results demonstrate that RC-PPO learns policies with comparable goal-reaching rates to while achieving up to 57% lower cumulative costs compared to existing methods on a suite of minimum-cost reach-avoid benchmarks on the Mujoco simulator.
Solving Minimum-Cost Reach Avoid using Reinforcement Learning
So, Oswin, Ge, Cheng, Fan, Chuchu
Current reinforcement-learning methods are unable to directly learn policies that solve the minimum cost reach-avoid problem to minimize cumulative costs subject to the constraints of reaching the goal and avoiding unsafe states, as the structure of this new optimization problem is incompatible with current methods. Instead, a surrogate problem is solved where all objectives are combined with a weighted sum. However, this surrogate objective results in suboptimal policies that do not directly minimize the cumulative cost. In this work, we propose RC-PPO, a reinforcement-learning-based method for solving the minimum-cost reach-avoid problem by using connections to Hamilton-Jacobi reachability. Empirical results demonstrate that RC-PPO learns policies with comparable goal-reaching rates to while achieving up to 57% lower cumulative costs compared to existing methods on a suite of minimum-cost reach-avoid benchmarks on the Mujoco simulator. The project page can be found at https://oswinso.xyz/rcppo/.
Guaranteed Reach-Avoid for Black-Box Systems through Narrow Gaps via Neural Network Reachability
Chung, Long Kiu, Jung, Wonsuhk, Pullabhotla, Srivatsank, Shinde, Parth, Sunil, Yadu, Kota, Saihari, Batista, Luis Felipe Wolf, Pradalier, Cédric, Kousik, Shreyas
In the classical reach-avoid problem, autonomous mobile robots are tasked to reach a goal while avoiding obstacles. However, it is difficult to provide guarantees on the robot's performance when the obstacles form a narrow gap and the robot is a black-box (i.e. the dynamics are not known analytically, but interacting with the system is cheap). To address this challenge, this paper presents NeuralPARC. The method extends the authors' prior Piecewise Affine Reach-avoid Computation (PARC) method to systems modeled by rectified linear unit (ReLU) neural networks, which are trained to represent parameterized trajectory data demonstrated by the robot. NeuralPARC computes the reachable set of the network while accounting for modeling error, and returns a set of states and parameters with which the black-box system is guaranteed to reach the goal and avoid obstacles. Through numerical experiments, NeuralPARC is shown to outperform PARC in generating provably-safe extreme vehicle drift parking maneuvers, as well as enabling safety on an autonomous surface vehicle (ASV) subjected to large disturbances and controlled by a deep reinforcement learning (RL) policy.