Goto

Collaborating Authors

 Armand, Mehran


dARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale

arXiv.org Artificial Intelligence

Data scarcity has long been an issue in the robot learning community. Particularly, in safety-critical domains like surgical applications, obtaining high-quality data can be especially difficult. It poses challenges to researchers seeking to exploit recent advancements in reinforcement learning and imitation learning, which have greatly improved generalizability and enabled robots to conduct tasks autonomously. We introduce dARt Vinci, a scalable data collection platform for robot learning in surgical settings. The system uses Augmented Reality (AR) hand tracking and a high-fidelity physics engine to capture subtle maneuvers in primitive surgical tasks: By eliminating the need for a physical robot setup and providing flexibility in terms of time, space, and hardware resources-such as multiview sensors and actuators-specialized simulation is a viable alternative. At the same time, AR allows the robot data collection to be more egocentric, supported by its body tracking and content overlaying capabilities. Our user study confirms the proposed system's efficiency and usability, where we use widely-used primitive tasks for training teleoperation with da Vinci surgical robots. Data throughput improves across all tasks compared to real robot settings by 41% on average. The total experiment time is reduced by an average of 10%. The temporal demand in the task load survey is improved. These gains are statistically significant. Additionally, the collected data is over 400 times smaller in size, requiring far less storage while achieving double the frequency.


Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation

arXiv.org Artificial Intelligence

Imitation learning frameworks for robotic manipulation have drawn attention in the recent development of language model grounded robotics. However, the success of the frameworks largely depends on the coverage of the demonstration cases: When the demonstration set does not include examples of how to act in all possible situations, the action may fail and can result in cascading errors. To solve this problem, we propose a framework that uses serialized Finite State Machine (FSM) to generate demonstrations and improve the success rate in manipulation tasks requiring a long sequence of precise interactions. To validate its effectiveness, we use environmentally evolving and long-horizon puzzles that require long sequential actions. Experimental results show that our approach achieves a success rate of up to 98 in these tasks, compared to the controlled condition using existing approaches, which only had a success rate of up to 60, and, in some tasks, almost failed completely.


An Image-Guided Robotic System for Transcranial Magnetic Stimulation: System Development and Experimental Evaluation

arXiv.org Artificial Intelligence

Transcranial magnetic stimulation (TMS) is a noninvasive medical procedure that can modulate brain activity, and it is widely used in neuroscience and neurology research. Compared to manual operators, robots may improve the outcome of TMS due to their superior accuracy and repeatability. However, there has not been a widely accepted standard protocol for performing robotic TMS using fine-segmented brain images, resulting in arbitrary planned angles with respect to the true boundaries of the modulated cortex. Given that the recent study in TMS simulation suggests a noticeable difference in outcomes when using different anatomical details, cortical shape should play a more significant role in deciding the optimal TMS coil pose. In this work, we introduce an image-guided robotic system for TMS that focuses on (1) establishing standardized planning methods and heuristics to define a reference (true zero) for the coil poses and (2) solving the issue that the manual coil placement requires expert hand-eye coordination which often leading to low repeatability of the experiments. To validate the design of our robotic system, a phantom study and a preliminary human subject study were performed. Our results show that the robotic method can half the positional error and improve the rotational accuracy by up to two orders of magnitude. The accuracy is proven to be repeatable because the standard deviation of multiple trials is lowered by an order of magnitude. The improved actuation accuracy successfully translates to the TMS application, with a higher and more stable induced voltage in magnetic field sensors.


Uncertainty-Aware Shape Estimation of a Surgical Continuum Manipulator in Constrained Environments using Fiber Bragg Grating Sensors

arXiv.org Artificial Intelligence

Continuum Dexterous Manipulators (CDMs) are well-suited tools for minimally invasive surgery due to their inherent dexterity and reachability. Nonetheless, their flexible structure and non-linear curvature pose significant challenges for shape-based feedback control. The use of Fiber Bragg Grating (FBG) sensors for shape sensing has shown great potential in estimating the CDM's tip position and subsequently reconstructing the shape using optimization algorithms. This optimization, however, is under-constrained and may be ill-posed for complex shapes, falling into local minima. In this work, we introduce a novel method capable of directly estimating a CDM's shape from FBG sensor wavelengths using a deep neural network. In addition, we propose the integration of uncertainty estimation to address the critical issue of uncertainty in neural network predictions. Neural network predictions are unreliable when the input sample is outside the training distribution or corrupted by noise. Recognizing such deviations is crucial when integrating neural networks within surgical robotics, as inaccurate estimations can pose serious risks to the patient. We present a robust method that not only improves the precision upon existing techniques for FBG-based shape estimation but also incorporates a mechanism to quantify the models' confidence through uncertainty estimation. We validate the uncertainty estimation through extensive experiments, demonstrating its effectiveness and reliability on out-of-distribution (OOD) data, adding an additional layer of safety and precision to minimally invasive surgical robotics.


On the Fly Robotic-Assisted Medical Instrument Planning and Execution Using Mixed Reality

arXiv.org Artificial Intelligence

Robotic-assisted medical systems (RAMS) have gained significant attention for their advantages in alleviating surgeons' fatigue and improving patients' outcomes. These systems comprise a range of human-computer interactions, including medical scene monitoring, anatomical target planning, and robot manipulation. However, despite its versatility and effectiveness, RAMS demands expertise in robotics, leading to a high learning cost for the operator. In this work, we introduce a novel framework using mixed reality technologies to ease the use of RAMS. The proposed framework achieves real-time planning and execution of medical instruments by providing 3D anatomical image overlay, human-robot collision detection, and robot programming interface. These features, integrated with an easy-to-use calibration method for head-mounted display, improve the effectiveness of human-robot interactions. To assess the feasibility of the framework, two medical applications are presented in this work: 1) coil placement during transcranial magnetic stimulation and 2) drill and injector device positioning during femoroplasty. Results from these use cases demonstrate its potential to extend to a wider range of medical scenarios.


GBEC: Geometry-Based Hand-Eye Calibration

arXiv.org Artificial Intelligence

Hand-eye calibration is the problem of solving the transformation from the end-effector of a robot to the sensor attached to it. Commonly employed techniques, such as AXXB or AXZB formulations, rely on regression methods that require collecting pose data from different robot configurations, which can produce low accuracy and repeatability. However, the derived transformation should solely depend on the geometry of the end-effector and the sensor attachment. We propose Geometry-Based End-Effector Calibration (GBEC) that enhances the repeatability and accuracy of the derived transformation compared to traditional hand-eye calibrations. To demonstrate improvements, we apply the approach to two different robot-assisted procedures: Transcranial Magnetic Stimulation (TMS) and femoroplasty. We also discuss the generalizability of GBEC for camera-in-hand and marker-in-hand sensor mounting methods. In the experiments, we perform GBEC between the robot end-effector and an optical tracker's rigid body marker attached to the TMS coil or femoroplasty drill guide. Previous research documents low repeatability and accuracy of the conventional methods for robot-assisted TMS hand-eye calibration. When compared to some existing methods, the proposed method relies solely on the geometry of the flange and the pose of the rigid-body marker, making it independent of workspace constraints or robot accuracy, without sacrificing the orthogonality of the rotation matrix. Our results validate the accuracy and applicability of the approach, providing a new and generalizable methodology for obtaining the transformation from the end-effector to a sensor.


FluoroSAM: A Language-aligned Foundation Model for X-ray Image Segmentation

arXiv.org Artificial Intelligence

Automated X-ray image segmentation would accelerate research and development in diagnostic and interventional precision medicine. Prior efforts have contributed task-specific models capable of solving specific image analysis problems, but the utility of these models is restricted to their particular task domain, and expanding to broader use requires additional data, labels, and retraining efforts. Recently, foundation models (FMs) -- machine learning models trained on large amounts of highly variable data thus enabling broad applicability -- have emerged as promising tools for automated image analysis. Existing FMs for medical image analysis focus on scenarios and modalities where objects are clearly defined by visually apparent boundaries, such as surgical tool segmentation in endoscopy. X-ray imaging, by contrast, does not generally offer such clearly delineated boundaries or structure priors. During X-ray image formation, complex 3D structures are projected in transmission onto the imaging plane, resulting in overlapping features of varying opacity and shape. To pave the way toward an FM for comprehensive and automated analysis of arbitrary medical X-ray images, we develop FluoroSAM, a language-aligned variant of the Segment-Anything Model, trained from scratch on 1.6M synthetic X-ray images. FluoroSAM is trained on data including masks for 128 organ types and 464 non-anatomical objects, such as tools and implants. In real X-ray images of cadaveric specimens, FluoroSAM is able to segment bony anatomical structures based on text-only prompting with 0.51 and 0.79 DICE with point-based refinement, outperforming competing SAM variants for all structures. FluoroSAM is also capable of zero-shot generalization to segmenting classes beyond the training set thanks to its language alignment, which we demonstrate for full lung segmentation on real chest X-rays.


Realtime Robust Shape Estimation of Deformable Linear Object

arXiv.org Artificial Intelligence

Realtime shape estimation of continuum objects and manipulators is essential for developing accurate planning and control paradigms. The existing methods that create dense point clouds from camera images, and/or use distinguishable markers on a deformable body have limitations in realtime tracking of large continuum objects/manipulators. The physical occlusion of markers can often compromise accurate shape estimation. We propose a robust method to estimate the shape of linear deformable objects in realtime using scattered and unordered key points. By utilizing a robust probability-based labeling algorithm, our approach identifies the true order of the detected key points and then reconstructs the shape using piecewise spline interpolation. The approach only relies on knowing the number of the key points and the interval between two neighboring points. We demonstrate the robustness of the method when key points are partially occluded. The proposed method is also integrated into a simulation in Unity for tracking the shape of a cable with a length of 1m and a radius of 5mm. The simulation results show that our proposed approach achieves an average length error of 1.07% over the continuum's centerline and an average cross-section error of 2.11mm. The real-world experiments of tracking and estimating a heavy-load cable prove that the proposed approach is robust under occlusion and complex entanglement scenarios.


A Roadmap Towards Automated and Regulated Robotic Systems

arXiv.org Artificial Intelligence

The rapid development of generative technology opens up possibility for higher level of automation, and artificial intelligence (AI) embodiment in robotic systems is imminent. However, due to the blackbox nature of the generative technology, the generation of the knowledge and workflow scheme is uncontrolled, especially in a dynamic environment and a complex scene. This poses challenges to regulations in safety-demanding applications such as medical scenes. We argue that the unregulated generative processes from AI is fitted for low level end tasks, but intervention in the form of manual or automated regulation should happen post-workflow-generation and pre-robotic-execution. To address this, we propose a roadmap that can lead to fully automated and regulated robotic systems. In this paradigm, the high level policies are generated as structured graph data, enabling regulatory oversight and reusability, while the code base for lower level tasks is generated by generative models. Our approach aims the transitioning from expert knowledge to regulated action, akin to the iterative processes of study, practice, scrutiny, and execution in human tasks. We identify the generative and deterministic processes in a design cycle, where generative processes serve as a text-based world simulator and the deterministic processes generate the executable system. We propose State Machine Seralization Language (SMSL) to be the conversion point between text simulator and executable workflow control. From there, we analyze the modules involved based on the current literature, and discuss human in the loop. As a roadmap, this work identifies the current possible implementation and future work. This work does not provide an implemented system but envisions to inspire the researchers working on the direction in the roadmap. We implement the SMSL and D-SFO paradigm that serve as the starting point of the roadmap.


Data-Driven Shape Sensing in Continuum Manipulators via Sliding Resistive Flex Sensors

arXiv.org Artificial Intelligence

We introduce a novel shape-sensing method using Resistive Flex Sensors (RFS) embedded in cable-driven Continuum Dexterous Manipulators (CDMs). The RFS is predominantly sensitive to deformation rather than direct forces, making it a distinctive tool for shape sensing. The RFS unit we designed is a considerably less expensive and robust alternative, offering comparable accuracy and real-time performance to existing shape sensing methods used for the CDMs proposed for minimally-invasive surgery. Our design allows the RFS to move along and inside the CDM conforming to its curvature, offering the ability to capture resistance metrics from various bending positions without the need for elaborate sensor setups. The RFS unit is calibrated using an overhead camera and a ResNet machine learning framework. Experiments using a 3D printed prototype of the CDM achieved an average shape estimation error of 0.968 mm with a standard error of 0.275 mm. The response time of the model was approximately 1.16 ms, making real-time shape sensing feasible. While this preliminary study successfully showed the feasibility of our approach for C-shape CDM deformations with non-constant curvatures, we are currently extending the results to show the feasibility for adapting to more complex CDM configurations such as S-shape created in obstructed environments or in presence of the external forces.