Goto

Collaborating Authors

 unberath


Beyond Rigid AI: Towards Natural Human-Machine Symbiosis for Interoperative Surgical Assistance

Seenivasan, Lalithkumar, Xu, Jiru, Mukul, Roger D. Soberanis, Ding, Hao, Byrd, Grayson, Ku, Yu-Chun, Porras, Jose L., Ishii, Masaru, Unberath, Mathias

arXiv.org Artificial Intelligence

Emerging surgical data science and robotics solutions, especially those designed to provide assistance in situ, require natural human-machine interfaces to fully unlock their potential in providing adaptive and intuitive aid. Contemporary AI-driven solutions remain inherently rigid, offering limited flexibility and restricting natural human-machine interaction in dynamic surgical environments. These solutions rely heavily on extensive task-specific pre-training, fixed object categories, and explicit manual-prompting. This work introduces a novel Perception Agent that leverages speech-integrated prompt-engineered large language models (LLMs), segment anything model (SAM), and any-point tracking foundation models to enable a more natural human-machine interaction in real-time intraoperative surgical assistance. Incorporating a memory repository and two novel mechanisms for segmenting unseen elements, Perception Agent offers the flexibility to segment both known and unseen elements in the surgical scene through intuitive interaction. Incorporating the ability to memorize novel elements for use in future surgeries, this work takes a marked step towards human-machine symbiosis in surgical procedures. Through quantitative analysis on a public dataset, we show that the performance of our agent is on par with considerably more labor-intensive manual-prompting strategies. Qualitatively, we show the flexibility of our agent in segmenting novel elements (instruments, phantom grafts, and gauze) in a custom-curated dataset. By offering natural human-machine interaction and overcoming rigidity, our Perception Agent potentially brings AI-based real-time assistance in dynamic surgical environments closer to reality.


Towards Robust Automation of Surgical Systems via Digital Twin-based Scene Representations from Foundation Models

Ding, Hao, Seenivasan, Lalithkumar, Shu, Hongchao, Byrd, Grayson, Zhang, Han, Xiao, Pu, Barragan, Juan Antonio, Taylor, Russell H., Kazanzides, Peter, Unberath, Mathias

arXiv.org Artificial Intelligence

Large language model-based (LLM) agents are emerging as a powerful enabler of robust embodied intelligence due to their capability of planning complex action sequences. Sound planning ability is necessary for robust automation in many task domains, but especially in surgical automation. These agents rely on a highly detailed natural language representation of the scene. Thus, to leverage the emergent capabilities of LLM agents for surgical task planning, developing similarly powerful and robust perception algorithms is necessary to derive a detailed scene representation of the environment from visual input. Previous research has focused primarily on enabling LLM-based task planning while adopting simple yet severely limited perception solutions to meet the needs for bench-top experiments but lack the critical flexibility to scale to less constrained settings. In this work, we propose an alternate perception approach -- a digital twin-based machine perception approach that capitalizes on the convincing performance and out-of-the-box generalization of recent vision foundation models. Integrating our digital twin-based scene representation and LLM agent for planning with the dVRK platform, we develop an embodied intelligence system and evaluate its robustness in performing peg transfer and gauze retrieval tasks. Our approach shows strong task performance and generalizability to varied environment settings. Despite convincing performance, this work is merely a first step towards the integration of digital twin-based scene representations. Future studies are necessary for the realization of a comprehensive digital twin framework to improve the interpretability and generalizability of embodied intelligence in surgery.


FluoroSAM: A Language-aligned Foundation Model for X-ray Image Segmentation

Killeen, Benjamin D., Wang, Liam J., Zhang, Han, Armand, Mehran, Taylor, Russell H., Dreizin, Dave, Osgood, Greg, Unberath, Mathias

arXiv.org Artificial Intelligence

Automated X-ray image segmentation would accelerate research and development in diagnostic and interventional precision medicine. Prior efforts have contributed task-specific models capable of solving specific image analysis problems, but the utility of these models is restricted to their particular task domain, and expanding to broader use requires additional data, labels, and retraining efforts. Recently, foundation models (FMs) -- machine learning models trained on large amounts of highly variable data thus enabling broad applicability -- have emerged as promising tools for automated image analysis. Existing FMs for medical image analysis focus on scenarios and modalities where objects are clearly defined by visually apparent boundaries, such as surgical tool segmentation in endoscopy. X-ray imaging, by contrast, does not generally offer such clearly delineated boundaries or structure priors. During X-ray image formation, complex 3D structures are projected in transmission onto the imaging plane, resulting in overlapping features of varying opacity and shape. To pave the way toward an FM for comprehensive and automated analysis of arbitrary medical X-ray images, we develop FluoroSAM, a language-aligned variant of the Segment-Anything Model, trained from scratch on 1.6M synthetic X-ray images. FluoroSAM is trained on data including masks for 128 organ types and 464 non-anatomical objects, such as tools and implants. In real X-ray images of cadaveric specimens, FluoroSAM is able to segment bony anatomical structures based on text-only prompting with 0.51 and 0.79 DICE with point-based refinement, outperforming competing SAM variants for all structures. FluoroSAM is also capable of zero-shot generalization to segmenting classes beyond the training set thanks to its language alignment, which we demonstrate for full lung segmentation on real chest X-rays.


Synthetic data for AI outperform real data in robot-assisted surgery

#artificialintelligence

While artificial intelligence continues to transform health care, the tech has an Achilles heel: training AI systems to perform specific tasks requires a great deal of annotated data that engineers sometimes just don't have or cannot get. In a perfect world, researchers would be able to digitally generate the exact data they need when they need it, unlocking new capabilities of AI. In reality, however, even digitally generating this data is tricky because real-world data, especially in medicine, is complex and multi-faceted. But solutions are in the pipeline. Researchers in the Whiting School of Engineering's Laboratory for Computational Sensing and Robotics have created software to realistically simulate the data necessary for developing AI algorithms that perform important tasks in surgery, such as X-ray image analysis.


With a little help from AI

#artificialintelligence

The U.S. economy has been rocked by the coronavirus pandemic, with stock values crashing this year during the period between late February and late March, when states began issuing stay-at-home orders. One intrepid digital company saw its stock surge, however--Zoom Video Communications, Inc., which has more than doubled its individual stock price while the economy around it crashed. But why has Zoom become the go-to platform during the pandemic, when there are dozens of other video conferencing services out there? The answer lies in Zoom's intuitive interface, says Mathias Unberath, an assistant professor of computer science at Johns Hopkins Whiting School of Engineering and a member of the Malone Center for Engineering and Healthcare. "Whether someone is hosting a work meeting or a baby shower, the app is easy to use," he says.