trajectory planner
Reinforcement Learning-based Dynamic Adaptation for Sampling-Based Motion Planning in Agile Autonomous Driving
Langmann, Alexander, Tokarev, Yevhenii, Piccinini, Mattia, Moller, Korbinian, Betz, Johannes
Sampling-based trajectory planners are widely used for agile autonomous driving due to their ability to generate fast, smooth, and kinodynamically feasible trajectories. However, their behavior is often governed by a cost function with manually tuned, static weights, which forces a tactical compromise that is suboptimal across the wide range of scenarios encountered in a race. To address this shortcoming, we propose using a Reinforcement Learning (RL) agent as a high-level behavioral selector that dynamically switches the cost function parameters of an analytical, low-level trajectory planner during runtime. We show the effectiveness of our approach in simulation in an autonomous racing environment where our RL-based planner achieved 0% collision rate while reducing overtaking time by up to 60% compared to state-of-the-art static planners. Our new agent now dynamically switches between aggressive and conservative behaviors, enabling interactive maneuvers unattainable with static configurations. These results demonstrate that integrating reinforcement learning as a high-level selector resolves the inherent trade-off between safety and competitiveness in autonomous racing planners. The proposed methodology offers a pathway toward adaptive yet interpretable motion planning for broader autonomous driving applications.
Efficient Navigation in Unknown Indoor Environments with Vision-Language Models
Schwartz, D., Kondo, K., How, J. P.
We present a novel high-level planning framework that leverages vision-language models (VLMs) to improve autonomous navigation in unknown indoor environments with many dead ends. Traditional exploration methods often take inefficient routes due to limited global reasoning and reliance on local heuristics. In contrast, our approach enables a VLM to reason directly about occupancy maps in a zero-shot manner, selecting subgoals that are likely to yield more efficient paths. At each planning step, we convert a 3D occupancy grid into a partial 2D map of the environment, and generate candidate subgoals. Each subgoal is then evaluated and ranked against other candidates by the model. We integrate this planning scheme into DYNUS \cite{kondo2025dynus}, a state-of-the-art trajectory planner, and demonstrate improved navigation efficiency in simulation. The VLM infers structural patterns (e.g., rooms, corridors) from incomplete maps and balances the need to make progress toward a goal against the risk of entering unknown space. This reduces common greedy failures (e.g., detouring into small rooms) and achieves about 10\% shorter paths on average.
What Did I Learn? Operational Competence Assessment for AI-Based Trajectory Planners
Braat, Michiel, Buermann, Maren, van Weperen, Marijke, Paardekooper, Jan-Pieter
Automated driving functions increasingly rely on machine learning for tasks like perception and trajectory planning, requiring large, relevant datasets. The performance of these algorithms depends on how closely the training data matches the task. To ensure reliable functioning, it is crucial to know what is included in the dataset to assess the trained model's operational risk. We aim to enhance the safe use of machine learning in automated driving by developing a method to recognize situations that an automated vehicle has not been sufficiently trained on. This method also improves explainability by describing the dataset at a human-understandable level. We propose modeling driving data as knowledge graphs, representing driving scenes with entities and their relationships. These graphs are queried for specific sub-scene configurations to check their occurrence in the dataset. We estimate a vehicle's competence in a driving scene by considering the coverage and complexity of sub-scene configurations in the training set. Higher complexity scenes require greater coverage for high competence. We apply this method to the NuPlan dataset, modeling it with knowledge graphs and analyzing the coverage of specific driving scenes. This approach helps monitor the competence of machine learning models trained on the dataset, which is essential for trustworthy AI to be deployed in automated driving.
Towards Bio-Inspired Robotic Trajectory Planning via Self-Supervised RNN
Cibula, Miroslav, Malinovská, Kristína, Kerzel, Matthias
Trajectory planning in robotics is understood as generating a sequence of joint configurations that will lead a robotic agent, or its manipulator, from an initial state to the desired final state, thus completing a manipulation task while considering constraints like robot kinematics and the environment. Typically, this is achieved via sampling-based planners, which are computationally intensive. Recent advances demonstrate that trajectory planning can also be performed by supervised sequence learning of trajectories, often requiring only a single or fixed number of passes through a neural architecture, thus ensuring a bounded computation time. Such fully supervised approaches, however, perform imitation learning; they do not learn based on whether the trajectories can successfully reach a goal, but try to reproduce observed trajectories. In our work, we build on this approach and propose a cognitively inspired self-supervised learning scheme based on a recurrent architecture for building a trajectory model. We evaluate the feasibility of the proposed method on a task of kinematic planning for a robotic arm. The results suggest that the model is able to learn to generate trajectories only using given paired forward and inverse kinematics models, and indicate that this novel method could facilitate planning for more complex manipulation tasks requiring adaptive solutions.
Fast Trajectory Planner with a Reinforcement Learning-based Controller for Robotic Manipulators
Wang, Yongliang, Kasaei, Hamidreza
Generating obstacle-free trajectories for robotic manipulators in unstructured and cluttered environments remains a significant challenge. Existing motion planning methods often require additional computational effort to generate the final trajectory by solving kinematic or dynamic equations. This paper highlights the strong potential of model-free reinforcement learning methods over model-based approaches for obstacle-free trajectory planning in joint space. We propose a fast trajectory planning system for manipulators that combines vision-based path planning in task space with reinforcement learning-based obstacle avoidance in joint space. We divide the framework into two key components. The first introduces an innovative vision-based trajectory planner in task space, leveraging the large-scale fast segment anything (FSA) model in conjunction with basis spline (B-spline)-optimized kinodynamic path searching. The second component enhances the proximal policy optimization (PPO) algorithm by integrating action ensembles (AE) and policy feedback (PF), which greatly improve precision and stability in goal-reaching and obstacle avoidance within the joint space. These PPO enhancements increase the algorithm's adaptability across diverse robotic tasks, ensuring consistent execution of commands from the first component by the manipulator, while also enhancing both obstacle avoidance efficiency and reaching accuracy. The experimental results demonstrate the effectiveness of PPO enhancements, as well as simulation-to-simulation (Sim-to-Sim) and simulation-to-reality (Sim-to-Real) transfer, in improving model robustness and planner efficiency in complex scenarios. These enhancements allow the robot to perform obstacle avoidance and real-time trajectory planning in obstructed environments. Project page available at: https://sites.google.com/view/ftp4rm/home
Cooperative Grasping for Collective Object Transport in Constrained Environments
Alvear, David, Turkiyyah, George, Park, Shinkyu
-- We propose a novel framework for decision-making in cooperative grasping for two-robot object transport in constrained environments. The core of the framework is a Conditional Embedding (CE) model consisting of two neural networks that map grasp configuration information into an embedding space. The resulting embedding vectors are then used to identify feasible grasp configurations that allow two robots to collaboratively transport an object. T o ensure general-izability across diverse environments and object geometries, the neural networks are trained on a dataset comprising a range of environment maps and object shapes. We employ a supervised learning approach with negative sampling to ensure that the learned embeddings effectively distinguish between feasible and infeasible grasp configurations. Evaluation results across a wide range of environments and objects in simulations demonstrate the model's ability to reliably identify feasible grasp configurations.
Sequence Modeling for Time-Optimal Quadrotor Trajectory Optimization with Sampling-based Robustness Analysis
Mao, Katherine, Yu, Hongzhan, Zhang, Ruipeng, Spasojevic, Igor, Hsieh, M Ani, Gao, Sicun, Kumar, Vijay
Time-optimal trajectories drive quadrotors to their dynamic limits, but computing such trajectories involves solving non-convex problems via iterative nonlinear optimization, making them prohibitively costly for real-time applications. In this work, we investigate learning-based models that imitate a model-based time-optimal trajectory planner to accelerate trajectory generation. Given a dataset of collision-free geometric paths, we show that modeling architectures can effectively learn the patterns underlying time-optimal trajectories. We introduce a quantitative framework to analyze local analytic properties of the learned models, and link them to the Backward Reachable Tube of the geometric tracking controller. To enhance robustness, we propose a data augmentation scheme that applies random perturbations to the input paths. Compared to classical planners, our method achieves substantial speedups, and we validate its real-time feasibility on a hardware quadrotor platform. Experiments demonstrate that the learned models generalize to previously unseen path lengths. The code for our approach can be found here: https://github.com/maokat12/lbTOPPQuad
Exploiting Prior Knowledge in Preferential Learning of Individualized Autonomous Vehicle Driving Styles
Theiner, Lukas, Hirt, Sebastian, Steinke, Alexander, Findeisen, Rolf
Trajectory planning for automated vehicles commonly employs optimization over a moving horizon - Model Predictive Control - where the cost function critically influences the resulting driving style. However, finding a suitable cost function that results in a driving style preferred by passengers remains an ongoing challenge. We employ preferential Bayesian optimization to learn the cost function by iteratively querying a passenger's preference. Due to increasing dimensionality of the parameter space, preference learning approaches might struggle to find a suitable optimum with a limited number of experiments and expose the passenger to discomfort when exploring the parameter space. We address these challenges by incorporating prior knowledge into the preferential Bayesian optimization framework. Our method constructs a virtual decision maker from real-world human driving data to guide parameter sampling. In a simulation experiment, we achieve faster convergence of the prior-knowledge-informed learning procedure compared to existing preferential Bayesian optimization approaches and reduce the number of inadequate driving styles sampled.
Chance-Constrained Trajectory Planning with Multimodal Environmental Uncertainty
Ren, Kai, Ahn, Heejin, Kamgarpour, Maryam
We tackle safe trajectory planning under Gaussian mixture model (GMM) uncertainty. Specifically, we use a GMM to model the multimodal behaviors of obstacles' uncertain states. Then, we develop a mixed-integer conic approximation to the chance-constrained trajectory planning problem with deterministic linear systems and polyhedral obstacles. When the GMM moments are estimated via finite samples, we develop a tight concentration bound to ensure the chance constraint with a desired confidence. Moreover, to limit the amount of constraint violation, we develop a Conditional Value-at-Risk (CVaR) approach corresponding to the chance constraints and derive a tractable approximation for known and estimated GMM moments. We verify our methods with state-of-the-art trajectory prediction algorithms and autonomous driving datasets.
Autonomous Robotic Bone Micro-Milling System with Automatic Calibration and 3D Surface Fitting
Zhao, Enduo, Lin, Xiaofeng, Wang, Yifan, Harada, Kanako
Automating bone micro-milling using a robotic system presents challenges due to the uncertainties in both the external and internal features of bone tissue. For example, during a mouse cranial window creation, a circular path with a radius of 2 to 4 mm needs to be milled on the mouse skull using a microdrill. The uneven surface and non-uniform thickness of the mouse skull make it difficult to fully automate this process, requiring the system to possess advanced perceptual and adaptive capabilities. In this study, we propose an automatic calibration and 3D surface fitting method and integrate it into an autonomous robotic bone micro-milling system, enabling it to quickly, in real-time, and accurately perceive and adapt to the uneven surface and non-uniform thickness of the target without human assistance. Validation experiments on euthanized mice demonstrate that the improved system achieves a success rate of 85.7 % and an average milling time of 2.1 minutes, showing not only significant performance improvements over the previous system but also exceptional accuracy, speed, and stability compared to human operators.