feasibility model
Continuous World Coverage Path Planning for Fixed-Wing UAVs using Deep Reinforcement Learning
Theile, Mirco, Rodriguez, Andres R. Zapata, Caccamo, Marco, Sangiovanni-Vincentelli, Alberto L.
-- Unmanned Aerial V ehicle (UA V) Coverage Path Planning (CPP) is critical for applications such as precision agriculture and search and rescue. While traditional methods rely on discrete grid-based representations, real-world UA V operations require power-efficient continuous motion planning. We formulate the UA V CPP problem in a continuous environment, minimizing power consumption while ensuring complete coverage. We train a reinforcement learning agent using an action-mapping-based Soft Actor-Critic (AM-SAC) algorithm employing a self-adaptive curriculum. Experiments on both procedurally generated and hand-crafted scenarios demonstrate the effectiveness of our method in learning energy-efficient coverage strategies. Unmanned Aerial V ehicle (UA V) Coverage Path Planning (CPP) is a challenging problem with numerous real-world applications.
Action Mapping for Reinforcement Learning in Continuous Environments with Constraints
Theile, Mirco, Dirnberger, Lukas, Trumpp, Raphael, Caccamo, Marco, Sangiovanni-Vincentelli, Alberto L.
Deep reinforcement learning (DRL) has had success across various domains, but applying it to environments with constraints remains challenging due to poor sample efficiency and slow convergence. Recent literature explored incorporating model knowledge to mitigate these problems, particularly through the use of models that assess the feasibility of proposed actions. However, integrating feasibility models efficiently into DRL pipelines in environments with continuous action spaces is non-trivial. We propose a novel DRL training strategy utilizing action mapping that leverages feasibility models to streamline the learning process. By decoupling the learning of feasible actions from policy optimization, action mapping allows DRL agents to focus on selecting the optimal action from a reduced feasible action set. We demonstrate through experiments that action mapping significantly improves training performance in constrained environments with continuous action spaces, especially with imperfect feasibility models.
Retro-fallback: retrosynthetic planning in an uncertain world
Tripp, Austin, Maziarz, Krzysztof, Lewis, Sarah, Segler, Marwin, Hernández-Lobato, José Miguel
Retrosynthesis is the task of proposing a series of chemical reactions to create a desired molecule from simpler, buyable molecules. While previous works have proposed algorithms to find optimal solutions for a range of metrics (e.g. shortest, lowest-cost), these works generally overlook the fact that we have imperfect knowledge of the space of possible reactions, meaning plans created by the algorithm may not work in a laboratory. In this paper we propose a novel formulation of retrosynthesis in terms of stochastic processes to account for this uncertainty. We then propose a novel greedy algorithm called retro-fallback which maximizes the probability that at least one synthesis plan can be executed in the lab. Using in-silico benchmarks we demonstrate that retro-fallback generally produces better sets of synthesis plans than the popular MCTS and retro* algorithms.
Active Learning of Abstract Plan Feasibility
Noseworthy, Michael, Moses, Caris, Brand, Isaiah, Castro, Sebastian, Kaelbling, Leslie, Lozano-Pérez, Tomás, Roy, Nicholas
Long horizon sequential manipulation tasks are effectively addressed hierarchically: at a high level of abstraction the planner searches over abstract action sequences, and when a plan is found, lower level motion plans are generated. Such a strategy hinges on the ability to reliably predict that a feasible low level plan will be found which satisfies the abstract plan. However, computing Abstract Plan Feasibility (APF) is difficult because the outcome of a plan depends on real-world phenomena that are difficult to model, such as noise in estimation and execution. In this work, we present an active learning approach to efficiently acquire an APF predictor through task-independent, curious exploration on a robot. The robot identifies plans whose outcomes would be informative about APF, executes those plans, and learns from their successes or failures. Critically, we leverage an infeasible subsequence property to prune candidate plans in the active learning strategy, allowing our system to learn from less data. We evaluate our strategy in simulation and on a real Franka Emika Panda robot with integrated perception, experimentation, planning, and execution. In a stacking domain where objects have non-uniform mass distributions, we show that our system permits real robot learning of an APF model in four hundred self-supervised interactions, and that our learned model can be used effectively in multiple downstream tasks.