Vu, Minh Nhat
GPU-Accelerated Motion Planning of an Underactuated Forestry Crane in Cluttered Environments
Vu, Minh Nhat, Ebmer, Gerald, Watcher, Alexander, Ecker, Marc-Philip, Nguyen, Giang, Glueck, Tobias
Autonomous large-scale machine operations require fast, efficient, and collision-free motion planning while addressing unique challenges such as hydraulic actuation limits and underactuated joint dynamics. This paper presents a novel two-step motion planning framework designed for an underactuated forestry crane. The first step employs GPU-accelerated stochastic optimization to rapidly compute a globally shortest collision-free path. The second step refines this path into a dynamically feasible trajectory using a trajectory optimizer that ensures compliance with system dynamics and actuation constraints. The proposed approach is benchmarked against conventional techniques, including RRT-based methods and purely optimization-based approaches. Simulation results demonstrate substantial improvements in computation speed and motion feasibility, making this method highly suitable for complex crane systems.
RoboDesign1M: A Large-scale Dataset for Robot Design Understanding
Le, Tri, Nguyen, Toan, Tran, Quang, Nguyen, Quang, Huang, Baoru, Nguyen, Hoan, Vu, Minh Nhat, Ta, Tung D., Nguyen, Anh
Robot design is a complex and time-consuming process that requires specialized expertise. Gaining a deeper understanding of robot design data can enable various applications, including automated design generation, retrieving example designs from text, and developing AI-powered design assistants. While recent advancements in foundation models present promising approaches to addressing these challenges, progress in this field is hindered by the lack of large-scale design datasets. In this paper, we introduce RoboDesign1M, a large-scale dataset comprising 1 million samples. Our dataset features multimodal data collected from scientific literature, covering various robotics domains. We propose a semi-automated data collection pipeline, enabling efficient and diverse data acquisition. To assess the effectiveness of RoboDesign1M, we conduct extensive experiments across multiple tasks, including design image generation, visual question answering about designs, and design image retrieval. The results demonstrate that our dataset serves as a challenging new benchmark for design understanding tasks and has the potential to advance research in this field. RoboDesign1M will be released to support further developments in AI-driven robotic design automation.
FlowMP: Learning Motion Fields for Robot Planning with Conditional Flow Matching
Nguyen, Khang, Le, An T., Pham, Tien, Huber, Manfred, Peters, Jan, Vu, Minh Nhat
Prior flow matching methods in robotics have primarily learned velocity fields to morph one distribution of trajectories into another. In this work, we extend flow matching to capture second-order trajectory dynamics, incorporating acceleration effects either explicitly in the model or implicitly through the learning objective. Unlike diffusion models, which rely on a noisy forward process and iterative denoising steps, flow matching trains a continuous transformation (flow) that directly maps a simple prior distribution to the target trajectory distribution without any denoising procedure. By modeling trajectories with second-order dynamics, our approach ensures that generated robot motions are smooth and physically executable, avoiding the jerky or dynamically infeasible trajectories that first-order models might produce. We empirically demonstrate that this second-order conditional flow matching yields superior performance on motion planning benchmarks, achieving smoother trajectories and higher success rates than baseline planners. These findings highlight the advantage of learning acceleration-aware motion fields, as our method outperforms existing motion planning methods in terms of trajectory quality and planning success.
Towards Autonomous Wood-Log Grasping with a Forestry Crane: Simulator and Benchmarking
Vu, Minh Nhat, Wachter, Alexander, Ebmer, Gerald, Ecker, Marc-Philip, Glรผck, Tobias, Nguyen, Anh, Kemmetmueller, Wolfgang, Kugi, Andreas
Forestry machines operated in forest production environments face challenges when performing manipulation tasks, especially regarding the complicated dynamics of underactuated crane systems and the heavy weight of logs to be grasped. This study investigates the feasibility of using reinforcement learning for forestry crane manipulators in grasping and lifting heavy wood logs autonomously. We first build a simulator using Mujoco physics engine to create realistic scenarios, including modeling a forestry crane with 8 degrees of freedom from CAD data and wood logs of different sizes. We further implement a velocity controller for autonomous log grasping with deep reinforcement learning using a curriculum strategy. Utilizing our new simulator, the proposed control strategy exhibits a success rate of 96% when grasping logs of different diameters and under random initial configurations of the forestry crane. In addition, reward functions and reinforcement learning baselines are implemented to provide an open-source benchmark for the community in large-scale manipulation tasks. A video with several demonstrations can be seen at https://www.acin.tuwien.ac.at/en/d18a/
Online Trajectory Replanner for Dynamically Grasping Irregular Objects
Vu, Minh Nhat, Grander, Florian, Nguyen, Anh
This paper presents a new trajectory replanner for grasping irregular objects. Unlike conventional grasping tasks where the object's geometry is assumed simple, we aim to achieve a "dynamic grasp" of the irregular objects, which requires continuous adjustment during the grasping process. To effectively handle irregular objects, we propose a trajectory optimization framework that comprises two phases. Firstly, in a specified time limit of 10s, initial offline trajectories are computed for a seamless motion from an initial configuration of the robot to grasp the object and deliver it to a pre-defined target location. Secondly, fast online trajectory optimization is implemented to update robot trajectories in real-time within 100 ms. This helps to mitigate pose estimation errors from the vision system. To account for model inaccuracies, disturbances, and other non-modeled effects, trajectory tracking controllers for both the robot and the gripper are implemented to execute the optimal trajectories from the proposed framework. The intensive experimental results effectively demonstrate the performance of our trajectory planning framework in both simulation and real-world scenarios.
Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications
Nguyen, Nghia, Vu, Minh Nhat, Ta, Tung D., Huang, Baoru, Vo, Thieu, Le, Ngan, Nguyen, Anh
Vision language models have played a key role in extracting meaningful features for various robotic applications. Among these, Contrastive Language-Image Pretraining (CLIP) is widely used in robotic tasks that require both vision and natural language understanding. However, CLIP was trained solely on static images paired with text prompts and has not yet been fully adapted for robotic tasks involving dynamic actions. In this paper, we introduce Robotic-CLIP to enhance robotic perception capabilities. We first gather and label large-scale action data, and then build our Robotic-CLIP by fine-tuning CLIP on 309,433 videos (~7.4 million frames) of action data using contrastive learning. By leveraging action data, Robotic-CLIP inherits CLIP's strong image performance while gaining the ability to understand actions in robotic contexts. Intensive experiments show that our Robotic-CLIP outperforms other CLIP-based models across various language-driven robotic tasks. Additionally, we demonstrate the practical effectiveness of Robotic-CLIP in real-world grasping applications.
GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with Hierarchical Feature Learning
Nguyen, Huy Hoang, Vuong, An, Nguyen, Anh, Reid, Ian, Vu, Minh Nhat
Grasp detection is a fundamental robotic task critical to the success of many industrial applications. However, current language-driven models for this task often struggle with cluttered images, lengthy textual descriptions, or slow inference speed. We introduce GraspMamba, a new language-driven grasp detection method that employs hierarchical feature fusion with Mamba vision to tackle these challenges. By leveraging rich visual features of the Mamba-based backbone alongside textual information, our approach effectively enhances the fusion of multimodal features. GraspMamba represents the first Mamba-based grasp detection model to extract vision and language features at multiple scales, delivering robust performance and rapid inference time. Intensive experiments show that GraspMamba outperforms recent methods by a clear margin. We validate our approach through real-world robotic experiments, highlighting its fast inference speed.
Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning
Nguyen, Huy Hoang, Vu, Minh Nhat, Beck, Florian, Ebmer, Gerald, Nguyen, Anh, Kugi, Andreas
Combining a vision module inside a closed-loop control system for a \emph{seamless movement} of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a \emph{modular} zero-shot framework for language-driven manipulation of (dynamic) objects through a closed-loop control system with real-time trajectory replanning and an online 6D object pose localization. We segment an object within $\SI{0.5}{\second}$ by leveraging a vision language model via language commands. Then, guided by natural language commands, a closed-loop system, including a unified pose estimation and tracking and online trajectory planning, is utilized to continuously track this object and compute the optimal trajectory in real-time. Our proposed zero-shot framework provides a smooth trajectory that avoids jerky movements and ensures the robot can grasp a non-stationary object. Experiment results exhibit the real-time capability of the proposed zero-shot modular framework for the trajectory optimization module to accurately and efficiently grasp moving objects, i.e., up to \SI{30}{\hertz} update rates for the online 6D pose localization module and \SI{10}{\hertz} update rates for the receding-horizon trajectory optimization. These advantages highlight the modular framework's potential applications in robotics and human-robot interaction; see the video in https://www.acin.tuwien.ac.at/en/6e64/.
Hierarchical Motion Planning and Offline Robust Model Predictive Control for Autonomous Vehicles
Nguyen, Hung Duy, Vu, Minh Nhat, Nam, Nguyen Ngoc, Han, Kyoungseok
Driving vehicles in complex scenarios under harsh conditions is the biggest challenge for autonomous vehicles (AVs). To address this issue, we propose hierarchical motion planning and robust control strategy using the front-active steering system in complex scenarios with various slippery road adhesion coefficients while considering vehicle uncertain parameters. Behaviors of human vehicles (HVs) are considered and modeled in the form of a car-following model via the Intelligent Driver Model (IDM). Then, in the upper layer, the motion planner first generates an optimal trajectory by using the artificial potential field (APF) algorithm to formulate any surrounding objects, e.g., road marks, boundaries, and static/dynamic obstacles. To track the generated optimal trajectory, in the lower layer, an offline-constrained output feedback robust model predictive control (RMPC) is employed for the linear parameter varying (LPV) system by applying linear matrix inequality (LMI) optimization method that ensures the robustness against the model parameter uncertainties. Furthermore, by augmenting the system model, our proposed approach, called offline RMPC, achieves outstanding efficiency compared to three existing RMPC approaches, e.g., offset-offline RMPC, online RMPC, and offline RMPC without an augmented model (offline RMPC w/o AM), in both improving computing time and reducing input vibrations.
Model Predictive Trajectory Optimization With Dynamically Changing Waypoints for Serial Manipulators
Beck, Florian, Vu, Minh Nhat, Hartl-Nesic, Christian, Kugi, Andreas
Systematically including dynamically changing waypoints as desired discrete actions, for instance, resulting from superordinate task planning, has been challenging for online model predictive trajectory optimization with short planning horizons. This paper presents a novel waypoint model predictive control (wMPC) concept for online replanning tasks. The main idea is to split the planning horizon at the waypoint when it becomes reachable within the current planning horizon and reduce the horizon length towards the waypoints and goal points. This approach keeps the computational load low and provides flexibility in adapting to changing conditions in real time. The presented approach achieves competitive path lengths and trajectory durations compared to (global) offline RRT-type planners in a multi-waypoint scenario. Moreover, the ability of wMPC to dynamically replan tasks online is experimentally demonstrated on a KUKA LBR iiwa 14 R820 robot in a dynamic pick-and-place scenario.