Goto

Collaborating Authors

 Unberath, Mathias


Explainable AI Enhances Glaucoma Referrals, Yet the Human-AI Team Still Falls Short of the AI Alone

arXiv.org Artificial Intelligence

Primary care providers are vital for initial triage and referrals to specialty care. In glaucoma, asymptomatic and fast progression can lead to vision loss, necessitating timely referrals to specialists. However, primary eye care providers may not identify urgent cases, potentially delaying care. Artificial Intelligence (AI) offering explanations could enhance their referral decisions. We investigate how various AI explanations help providers distinguish between patients needing immediate or non-urgent specialist referrals. We built explainable AI algorithms to predict glaucoma surgery needs from routine eyecare data as a proxy for identifying high-risk patients. We incorporated intrinsic and post-hoc explainability and conducted an online study with optometrists to assess human-AI team performance, measuring referral accuracy and analyzing interactions with AI, including agreement rates, task time, and user experience perceptions. AI support enhanced referral accuracy among 87 participants (59.9%/50.8% with/without AI), though Human-AI teams underperformed compared to AI alone. Participants believed they included AI advice more when using the intrinsic model, and perceived it more useful and promising. Without explanations, deviations from AI recommendations increased. AI support did not increase workload, confidence, and trust, but reduced challenges. On a separate test set, our black-box and intrinsic models achieved an accuracy of 77% and 71%, respectively, in predicting surgical outcomes. We identify opportunities of human-AI teaming for glaucoma management in primary eye care, noting that while AI enhances referral accuracy, it also shows a performance gap compared to AI alone, even with explanations. Human involvement remains essential in medical decision making, underscoring the need for future research to optimize collaboration, ensuring positive experiences and safe AI use.


FluoroSAM: A Language-aligned Foundation Model for X-ray Image Segmentation

arXiv.org Artificial Intelligence

Automated X-ray image segmentation would accelerate research and development in diagnostic and interventional precision medicine. Prior efforts have contributed task-specific models capable of solving specific image analysis problems, but the utility of these models is restricted to their particular task domain, and expanding to broader use requires additional data, labels, and retraining efforts. Recently, foundation models (FMs) -- machine learning models trained on large amounts of highly variable data thus enabling broad applicability -- have emerged as promising tools for automated image analysis. Existing FMs for medical image analysis focus on scenarios and modalities where objects are clearly defined by visually apparent boundaries, such as surgical tool segmentation in endoscopy. X-ray imaging, by contrast, does not generally offer such clearly delineated boundaries or structure priors. During X-ray image formation, complex 3D structures are projected in transmission onto the imaging plane, resulting in overlapping features of varying opacity and shape. To pave the way toward an FM for comprehensive and automated analysis of arbitrary medical X-ray images, we develop FluoroSAM, a language-aligned variant of the Segment-Anything Model, trained from scratch on 1.6M synthetic X-ray images. FluoroSAM is trained on data including masks for 128 organ types and 464 non-anatomical objects, such as tools and implants. In real X-ray images of cadaveric specimens, FluoroSAM is able to segment bony anatomical structures based on text-only prompting with 0.51 and 0.79 DICE with point-based refinement, outperforming competing SAM variants for all structures. FluoroSAM is also capable of zero-shot generalization to segmenting classes beyond the training set thanks to its language alignment, which we demonstrate for full lung segmentation on real chest X-rays.


Designing AI Support for Human Involvement in AI-assisted Decision Making: A Taxonomy of Human-AI Interactions from a Systematic Review

arXiv.org Artificial Intelligence

Efforts in levering Artificial Intelligence (AI) in decision support systems have disproportionately focused on technological advancements, often overlooking the alignment between algorithmic outputs and human expectations. To address this, explainable AI promotes AI development from a more human-centered perspective. Determining what information AI should provide to aid humans is vital, however, how the information is presented, e. g., the sequence of recommendations and the solicitation of interpretations, is equally crucial. This motivates the need to more precisely study Human-AI interaction as a pivotal component of AI-based decision support. While several empirical studies have evaluated Human-AI interactions in multiple application domains in which interactions can take many forms, there is not yet a common vocabulary to describe human-AI interaction protocols. To address this gap, we describe the results of a systematic review of the AI-assisted decision making literature, analyzing 105 selected articles, which grounds the introduction of a taxonomy of interaction patterns that delineate various modes of human-AI interactivity. We find that current interactions are dominated by simplistic collaboration paradigms and report comparatively little support for truly interactive functionality. Our taxonomy serves as a valuable tool to understand how interactivity with AI is currently supported in decision-making contexts and foster deliberate choices of interaction designs.


Eye Tracking for Tele-robotic Surgery: A Comparative Evaluation of Head-worn Solutions

arXiv.org Artificial Intelligence

Purpose: Metrics derived from eye-gaze-tracking and pupillometry show promise for cognitive load assessment, potentially enhancing training and patient safety through user-specific feedback in tele-robotic surgery. However, current eye-tracking solutions' effectiveness in tele-robotic surgery is uncertain compared to everyday situations due to close-range interactions causing extreme pupil angles and occlusions. To assess the effectiveness of modern eye-gaze-tracking solutions in tele-robotic surgery, we compare the Tobii Pro 3 Glasses and Pupil Labs Core, evaluating their pupil diameter and gaze stability when integrated with the da Vinci Research Kit (dVRK). Methods: The study protocol includes a nine-point gaze calibration followed by pick-and-place task using the dVRK and is repeated three times. After a final calibration, users view a 3x3 grid of AprilTags, focusing on each marker for 10 seconds, to evaluate gaze stability across dVRK-screen positions with the L2-norm. Different gaze calibrations assess calibration's temporal deterioration due to head movements. Pupil diameter stability is evaluated using the FFT from the pupil diameter during the pick-and-place tasks. Users perform this routine with both head-worn eye-tracking systems. Results: Data collected from ten users indicate comparable pupil diameter stability. FFTs of pupil diameters show similar amplitudes in high-frequency components. Tobii Glasses show more temporal gaze stability compared to Pupil Labs, though both eye trackers yield a similar 4cm error in gaze estimation without an outdated calibration. Conclusion: Both eye trackers demonstrate similar stability of the pupil diameter and gaze, when the calibration is not outdated, indicating comparable eye-tracking and pupillometry performance in tele-robotic surgery settings.


TAToo: Vision-based Joint Tracking of Anatomy and Tool for Skull-base Surgery

arXiv.org Artificial Intelligence

Purpose: Tracking the 3D motion of the surgical tool and the patient anatomy is a fundamental requirement for computer-assisted skull-base surgery. The estimated motion can be used both for intra-operative guidance and for downstream skill analysis. Recovering such motion solely from surgical videos is desirable, as it is compliant with current clinical workflows and instrumentation. Methods: We present Tracker of Anatomy and Tool (TAToo). TAToo jointly tracks the rigid 3D motion of patient skull and surgical drill from stereo microscopic videos. TAToo estimates motion via an iterative optimization process in an end-to-end differentiable form. For robust tracking performance, TAToo adopts a probabilistic formulation and enforces geometric constraints on the object level. Results: We validate TAToo on both simulation data, where ground truth motion is available, as well as on anthropomorphic phantom data, where optical tracking provides a strong baseline. We report sub-millimeter and millimeter inter-frame tracking accuracy for skull and drill, respectively, with rotation errors below 1{\deg}. We further illustrate how TAToo may be used in a surgical navigation setting. Conclusion: We present TAToo, which simultaneously tracks the surgical tool and the patient anatomy in skull-base surgery. TAToo directly predicts the motion from surgical videos, without the need of any markers. Our results show that the performance of TAToo compares favorably to competing approaches. Future work will include fine-tuning of our depth network to reach a 1 mm clinical accuracy goal desired for surgical applications in the skull base.


Twin-S: A Digital Twin for Skull-base Surgery

arXiv.org Artificial Intelligence

Purpose: Digital twins are virtual interactive models of the real world, exhibiting identical behavior and properties. In surgical applications, computational analysis from digital twins can be used, for example, to enhance situational awareness. Methods: We present a digital twin framework for skull-base surgeries, named Twin-S, which can be integrated within various image-guided interventions seamlessly. Twin-S combines high-precision optical tracking and real-time simulation. We rely on rigorous calibration routines to ensure that the digital twin representation precisely mimics all real-world processes. Twin-S models and tracks the critical components of skull-base surgery, including the surgical tool, patient anatomy, and surgical camera. Significantly, Twin-S updates and reflects real-world drilling of the anatomical model in frame rate. Results: We extensively evaluate the accuracy of Twin-S, which achieves an average 1.39 mm error during the drilling process. We further illustrate how segmentation masks derived from the continuously updated digital twin can augment the surgical microscope view in a mixed reality setting, where bone requiring ablation is highlighted to provide surgeons additional situational awareness. Conclusion: We present Twin-S, a digital twin environment for skull-base surgery. Twin-S tracks and updates the virtual model in real-time given measurements from modern tracking technologies. Future research on complementing optical tracking with higher-precision vision-based approaches may further increase the accuracy of Twin-S.


Pelphix: Surgical Phase Recognition from X-ray Images in Percutaneous Pelvic Fixation

arXiv.org Artificial Intelligence

Surgical phase recognition (SPR) is a crucial element in the digital transformation of the modern operating theater. While SPR based on video sources is well-established, incorporation of interventional X-ray sequences has not yet been explored. This paper presents Pelphix, a first approach to SPR for X-ray-guided percutaneous pelvic fracture fixation, which models the procedure at four levels of granularity -- corridor, activity, view, and frame value -- simulating the pelvic fracture fixation workflow as a Markov process to provide fully annotated training data. Using added supervision from detection of bony corridors, tools, and anatomy, we learn image representations that are fed into a transformer model to regress surgical phases at the four granularity levels. Our approach demonstrates the feasibility of X-ray-based SPR, achieving an average accuracy of 93.8% on simulated sequences and 67.57% in cadaver across all granularity levels, with up to 88% accuracy for the target corridor in real data. This work constitutes the first step toward SPR for the X-ray domain, establishing an approach to categorizing phases in X-ray-guided surgery, simulating realistic image sequences to enable machine learning model development, and demonstrating that this approach is feasible for the analysis of real procedures. As X-ray-based SPR continues to mature, it will benefit procedures in orthopedic surgery, angiography, and interventional radiology by equipping intelligent surgical systems with situational awareness in the operating room.


Data AUDIT: Identifying Attribute Utility- and Detectability-Induced Bias in Task Models

arXiv.org Artificial Intelligence

To safely deploy deep learning-based computer vision models for computer-aided detection and diagnosis, we must ensure that they are robust and reliable. Towards that goal, algorithmic auditing has received substantial attention. To guide their audit procedures, existing methods rely on heuristic approaches or high-level objectives (e.g., non-discrimination in regards to protected attributes, such as sex, gender, or race). However, algorithms may show bias with respect to various attributes beyond the more obvious ones, and integrity issues related to these more subtle attributes can have serious consequences. To enable the generation of actionable, data-driven hypotheses which identify specific dataset attributes likely to induce model bias, we contribute a first technique for the rigorous, quantitative screening of medical image datasets. Drawing from literature in the causal inference and information theory domains, our procedure decomposes the risks associated with dataset attributes in terms of their detectability and utility (defined as the amount of information knowing the attribute gives about a task label). To demonstrate the effectiveness and sensitivity of our method, we develop a variety of datasets with synthetically inserted artifacts with different degrees of association to the target label that allow evaluation of inherited model biases via comparison of performance against true counterfactual examples. Using these datasets and results from hundreds of trained models, we show our screening method reliably identifies nearly imperceptible bias-inducing artifacts. Lastly, we apply our method to the natural attributes of a popular skin-lesion dataset and demonstrate its success. Our approach provides a means to perform more systematic algorithmic audits and guide future data collection efforts in pursuit of safer and more reliable models.


Task-based Generation of Optimized Projection Sets using Differentiable Ranking

arXiv.org Artificial Intelligence

We present a method for selecting valuable projections in computed tomography (CT) scans to enhance image reconstruction and diagnosis. The approach integrates two important factors, projection-based detectability and data completeness, into a single feed-forward neural network. The network evaluates the value of projections, processes them through a differentiable ranking function and makes the final selection using a straight-through estimator. Data completeness is ensured through the label provided during training. The approach eliminates the need for heuristically enforcing data completeness, which may exclude valuable projections. The method is evaluated on simulated data in a non-destructive testing scenario, where the aim is to maximize the reconstruction quality within a specified region of interest. We achieve comparable results to previous methods, laying the foundation for using reconstruction-based loss functions to learn the selection of projections.


Rethinking Causality-driven Robot Tool Segmentation with Temporal Constraints

arXiv.org Artificial Intelligence

Purpose: Vision-based robot tool segmentation plays a fundamental role in surgical robots and downstream tasks. CaRTS, based on a complementary causal model, has shown promising performance in unseen counterfactual surgical environments in the presence of smoke, blood, etc. However, CaRTS requires over 30 iterations of optimization to converge for a single image due to limited observability. Method: To address the above limitations, we take temporal relation into consideration and propose a temporal causal model for robot tool segmentation on video sequences. We design an architecture named Temporally Constrained CaRTS (TC-CaRTS). TC-CaRTS has three novel modules to complement CaRTS - temporal optimization pipeline, kinematics correction network, and spatial-temporal regularization. Results: Experiment results show that TC-CaRTS requires much fewer iterations to achieve the same or better performance as CaRTS. TC- CaRTS also has the same or better performance in different domains compared to CaRTS. All three modules are proven to be effective. Conclusion: We propose TC-CaRTS, which takes advantage of temporal constraints as additional observability. We show that TC-CaRTS outperforms prior work in the robot tool segmentation task with improved convergence speed on test datasets from different domains.