Vu, Minh Nhat
Observer-based Controller Design for Oscillation Damping of a Novel Suspended Underactuated Aerial Platform
Das, Hemjyoti, Vu, Minh Nhat, Egle, Tobias, Ott, Christian
In this work, we present a novel actuation strategy for a suspended aerial platform. By utilizing an underactuation approach, we demonstrate the successful oscillation damping of the proposed platform, modeled as a spherical double pendulum. A state estimator is designed in order to obtain the deflection angles of the platform, which uses only onboard IMU measurements. The state estimator is an extended Kalman filter (EKF) with intermittent measurements obtained at different frequencies. An optimal state feedback controller and a PD+ controller are designed in order to dampen the oscillations of the platform in the joint space and task space respectively. The proposed underactuated platform is found to be more energy-efficient than an omnidirectional platform and requires fewer actuators. The effectiveness of our proposed system is validated using both simulations and experimental studies.
Autonomous Catheterization with Open-source Simulator and Expert Trajectory
Jianu, Tudor, Huang, Baoru, Vo, Tuan, Vu, Minh Nhat, Kang, Jingxuan, Nguyen, Hoan, Omisore, Olatunji, Berthet-Rayne, Pierre, Fichera, Sebastiano, Nguyen, Anh
Endovascular robots have been actively developed in both academia and industry. However, progress toward autonomous catheterization is often hampered by the widespread use of closed-source simulators and physical phantoms. Additionally, the acquisition of large-scale datasets for training machine learning algorithms with endovascular robots is usually infeasible due to expensive medical procedures. In this chapter, we introduce CathSim, the first open-source simulator for endovascular intervention to address these limitations. CathSim emphasizes real-time performance to enable rapid development and testing of learning algorithms. We validate CathSim against the real robot and show that our simulator can successfully mimic the behavior of the real robot. Based on CathSim, we develop a multimodal expert navigation network and demonstrate its effectiveness in downstream endovascular navigation tasks. The intensive experimental results suggest that CathSim has the potential to significantly accelerate research in the autonomous catheterization field. Our project is publicly available at https://github.com/airvlab/cathsim. Endovascular interventions are commonly performed for the diagnosis and treatment of vascular diseases. This intervention involves the utilization of flexible tools, namely guidewires, and catheters. These instruments are introduced into the body via small incisions and manually navigated to specific body regions through the vascular system [69]. Endovascular tool navigation takes approximately 70% of the intervention time and is utilized for a plethora of vascular-related conditions such as peripheral artery disease, aneurysms, and stenosis [49].
Real-time 6-DoF Pose Estimation by an Event-based Camera using Active LED Markers
Ebmer, Gerald, Loch, Adam, Vu, Minh Nhat, Haessig, Germain, Mecca, Roberto, Vincze, Markus, Hartl-Nesic, Christian, Kugi, Andreas
Real-time applications for autonomous operations depend largely on fast and robust vision-based localization systems. Since image processing tasks require processing large amounts of data, the computational resources often limit the performance of other processes. To overcome this limitation, traditional marker-based localization systems are widely used since they are easy to integrate and achieve reliable accuracy. However, classical marker-based localization systems significantly depend on standard cameras with low frame rates, which often lack accuracy due to motion blur. In contrast, event-based cameras provide high temporal resolution and a high dynamic range, which can be utilized for fast localization tasks, even under challenging visual conditions. This paper proposes a simple but effective event-based pose estimation system using active LED markers (ALM) for fast and accurate pose estimation. The proposed algorithm is able to operate in real time with a latency below \SI{0.5}{\milli\second} while maintaining output rates of \SI{3}{\kilo \hertz}. Experimental results in static and dynamic scenarios are presented to demonstrate the performance of the proposed approach in terms of computational speed and absolute accuracy, using the OptiTrack system as the basis for measurement.
Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation
Van Vo, Tuan, Vu, Minh Nhat, Huang, Baoru, Nguyen, Toan, Le, Ngan, Vo, Thieu, Nguyen, Anh
Affordance detection presents intricate challenges and has a wide range of robotic applications. Previous works have faced limitations such as the complexities of 3D object shapes, the wide range of potential affordances on real-world objects, and the lack of open-vocabulary support for affordance understanding. In this paper, we introduce a new open-vocabulary affordance detection method in 3D point clouds, leveraging knowledge distillation and text-point correlation. Our approach employs pre-trained 3D models through knowledge distillation to enhance feature extraction and semantic understanding in 3D point clouds. We further introduce a new text-point correlation method to learn the semantic links between point cloud features and open-vocabulary labels. The intensive experiments show that our approach outperforms previous works and adapts to new affordance labels and unseen objects. Notably, our method achieves the improvement of 7.96% mIOU score compared to the baselines. Furthermore, it offers real-time inference which is well-suitable for robotic manipulation applications.
Language-Conditioned Affordance-Pose Detection in 3D Point Clouds
Nguyen, Toan, Vu, Minh Nhat, Huang, Baoru, Van Vo, Tuan, Truong, Vy, Le, Ngan, Vo, Thieu, Le, Bac, Nguyen, Anh
Affordance detection and pose estimation are of great importance in many robotic applications. Their combination helps the robot gain an enhanced manipulation capability, in which the generated pose can facilitate the corresponding affordance task. Previous methods for affodance-pose joint learning are limited to a predefined set of affordances, thus limiting the adaptability of robots in real-world environments. In this paper, we propose a new method for language-conditioned affordance-pose joint learning in 3D point clouds. Given a 3D point cloud object, our method detects the affordance region and generates appropriate 6-DoF poses for any unconstrained affordance label. Our method consists of an open-vocabulary affordance detection branch and a language-guided diffusion model that generates 6-DoF poses based on the affordance text. We also introduce a new high-quality dataset for the task of language-driven affordance-pose joint learning. Intensive experimental results demonstrate that our proposed method works effectively on a wide range of open-vocabulary affordances and outperforms other baselines by a large margin. In addition, we illustrate the usefulness of our method in real-world robotic applications. Our code and dataset are publicly available at https://3DAPNet.github.io
Grasp-Anything: Large-scale Grasp Dataset from Foundation Models
Vuong, An Dinh, Vu, Minh Nhat, Le, Hieu, Huang, Baoru, Huynh, Binh, Vo, Thieu, Kugi, Andreas, Nguyen, Anh
Foundation models such as ChatGPT have made significant strides in robotic tasks due to their universal representation of real-world domains. In this paper, we leverage foundation models to tackle grasp detection, a persistent challenge in robotics with broad industrial applications. Despite numerous grasp datasets, their object diversity remains limited compared to real-world figures. Fortunately, foundation models possess an extensive repository of real-world knowledge, including objects we encounter in our daily lives. As a consequence, a promising solution to the limited representation in previous grasp datasets is to harness the universal knowledge embedded in these foundation models. We present Grasp-Anything, a new large-scale grasp dataset synthesized from foundation models to implement this solution. Grasp-Anything excels in diversity and magnitude, boasting 1M samples with text descriptions and more than 3M objects, surpassing prior datasets. Empirically, we show that Grasp-Anything successfully facilitates zero-shot grasp detection on vision-based tasks and real-world robotic experiments. Our dataset and code are available at https://grasp-anything-2023.github.io.
CathSim: An Open-source Simulator for Endovascular Intervention
Jianu, Tudor, Huang, Baoru, Abdelaziz, Mohamed E. M. K., Vu, Minh Nhat, Fichera, Sebastiano, Lee, Chun-Yi, Berthet-Rayne, Pierre, Baena, Ferdinando Rodriguez y, Nguyen, Anh
Autonomous robots in endovascular operations have the potential to navigate circulatory systems safely and reliably while decreasing the susceptibility to human errors. However, there are numerous challenges involved with the process of training such robots, such as long training duration and safety issues arising from the interaction between the catheter and the aorta. Recently, endovascular simulators have been employed for medical training but generally do not conform to autonomous catheterization. Furthermore, most current simulators are closed-source, which hinders the collaborative development of safe and reliable autonomous systems. In this work, we introduce CathSim, an open-source simulation environment that accelerates the development of machine learning algorithms for autonomous endovascular navigation. We first simulate the high-fidelity catheter and aorta with a state-of-the-art endovascular robot. We then provide the capability of real-time force sensing between the catheter and the aorta in simulation. Furthermore, we validate our simulator by conducting two different catheterization tasks using two popular reinforcement learning algorithms. The experimental results show that our open-source simulator can mimic the behaviour of real-world endovascular robots and facilitate the development of different autonomous catheterization tasks. Our simulator is publicly available at https://github.com/robotvisionlabs/cathsim.
Open-Vocabulary Affordance Detection in 3D Point Clouds
Nguyen, Toan, Vu, Minh Nhat, Vuong, An, Nguyen, Dzung, Vo, Thieu, Le, Ngan, Nguyen, Anh
Affordance detection is a challenging problem with a wide variety of robotic applications. Traditional affordance detection methods are limited to a predefined set of affordance labels, hence potentially restricting the adaptability of intelligent robots in complex and dynamic environments. In this paper, we present the Open-Vocabulary Affordance Detection (OpenAD) method, which is capable of detecting an unbounded number of affordances in 3D point clouds. By simultaneously learning the affordance text and the point feature, OpenAD successfully exploits the semantic relationships between affordances. Therefore, our proposed method enables zero-shot detection and can be able to detect previously unseen affordances without a single annotation example. Intensive experimental results show that OpenAD works effectively on a wide range of affordance detection setups and outperforms other baselines by a large margin. Additionally, we demonstrate the practicality of the proposed OpenAD in real-world robotic applications with a fast inference speed (~100ms). Our project is available at https://openad2023.github.io.
Singularity Avoidance with Application to Online Trajectory Optimization for Serial Manipulators
Beck, Florian, Vu, Minh Nhat, Hartl-Nesic, Christian, Kugi, Andreas
Manipulability maximization for inverse kinematics is done, e.g., in Dufour and Suleiman (2017). Several important tasks in robotics require compliance in A potential function on the torque level, as an additive the robot's end-effector including handling tasks, such as impedance, based on the manipulability measure is proposed the peg-in-hole task, see, e.g., Park et al. (2017) and Song in Ott (2008) for singularity avoidance. Due to the et al. (2021), or more recently tasks in physical humanrobot complexity introduced by maximizing the manipulability interaction (pHRI), see, e.g., Sharifi et al. (2022) measure, an optimization approach using a dynamic neural and Li et al. (2018). To this end, control concepts enabling network is introduced in Jin et al. (2017) for tracking compliance in the end-effector, e.g., prescribing a specific control including the consideration of joint velocity limits.
Sampling-Based Trajectory (re)planning for Differentially Flat Systems: Application to a 3D Gantry Crane
Vu, Minh Nhat, Schwegel, Michael, Hartl-Nesic, Christian, Kugi, Andreas
In this paper, a sampling-based trajectory planning algorithm for a laboratory-scale 3D gantry crane in an environment with static obstacles and subject to bounds on the velocity and acceleration of the gantry crane system is presented. The focus is on developing a fast motion planning algorithm for differentially flat systems, where intermediate results can be stored and reused for further tasks, such as replanning. The proposed approach is based on the informed optimal rapidly exploring random tree algorithm (informed RRT*), which is utilized to build trajectory trees that are reused for replanning when the start and/or target states change. In contrast to state-of-the-art approaches, the proposed motion planning algorithm incorporates a linear quadratic minimum time (LQTM) local planner. Thus, dynamic properties such as time optimality and the smoothness of the trajectory are directly considered in the proposed algorithm. Moreover, by integrating the branch-and-bound method to perform the pruning process on the trajectory tree, the proposed algorithm can eliminate points in the tree that do not contribute to finding better solutions. This helps to curb memory consumption and reduce the computational complexity during motion (re)planning. Simulation results for a validated mathematical model of a 3D gantry crane show the feasibility of the proposed approach.