Goto

Collaborating Authors

 abdomen


Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies

Jiang, Luyi, Chen, Jiayuan, Lu, Lu, Peng, Xinwei, Liu, Lihao, He, Junjun, Xu, Jie

arXiv.org Artificial Intelligence

In recent years, large language models (LLMs), empowered by massive text corpora and deep learning techniques, have demonstrated breakthrough advancements in cross-domain knowledge transfer and human-machine dialogue interactions [1]. Within the healthcare domain, LLMs are increasingly deployed across nine core application scenarios, including intelligent diagnosis, personalized treatment, and drug discovery, garnering significant attention from both academia and industry [2, 3]. A particularly important area of focus is the development and evaluation of Chinese medical LLMs, which face unique challenges due to the specialized nature of medical knowledge and the high-stakes implications of clinical decision-making. Hence, ensuring the reliability and safety of these models has become critical, necessitating rigorous evaluation frameworks [4]. Current research on medical LLMs evaluation exhibits two predominant trends. On one hand, general-domain benchmarks (e.g., HELM [5], MMLU [6]) assess foundational model capabilities through medical knowledge tests. On the other hand, specialized medical evaluation systems (e.g., MedQA [7], C-Eval-Medical [8]) emphasize clinical reasoning and ethical compliance. Notably, the MedBench framework [9], jointly developed by institutions including Shanghai AI Laboratory, has emerged as the most influential benchmark for Chinese medical LLMs. By establishing a standardized evaluation system spanning five dimensions--medical language comprehension, complex reasoning, and safety ethics--it has attracted participation from hundreds of research teams.


Abdominal Undulation with Compliant Mechanism Improves Flight Performance of Biomimetic Robotic Butterfly

Lian, Xuyi, Luo, Mingyu, Lin, Te, Qian, Chen, Li, Tiefeng

arXiv.org Artificial Intelligence

Abstract-- This paper presents the design, modeling, and experimental validation of a biomimetic robotic butterfly (BRB) that integrates a compliant mechanism to achieve coupled wing-abdomen motion. Drawing inspiration from the natural flight dynamics of butterflies, a theoretical model is developed to investigate the impact of abdominal undulation on flight performance. To validate the model, motion capture experiments are conducted on three configurations: a BRB without an abdomen, with a fixed abdomen, and with an undulating abdomen. Recently, increasing attention has I. Flapping-wing aerial vehicles (FWAVs) have demonstrated Because the butterfly wings attached to the thorax have a advantages in maneuverability, energy efficiency, and adaptability, relatively high moment of inertia, aerodynamic and inertial making them ideal for potential applications such forces cause the thorax to pitch in sync with the wingbeats. Over past decades, significant forward flight, the abdomen swings in response to these progress has been made in designing bio-inspired FWAVs thoracic oscillations [13], [14], [15].


Pitfalls of defacing whole-head MRI: re-identification risk with diffusion models and compromised research potential

Gao, Chenyu, Xu, Kaiwen, Kim, Michael E., Zuo, Lianrui, Li, Zhiyuan, Archer, Derek B., Hohman, Timothy J., Moore, Ann Zenobia, Ferrucci, Luigi, Beason-Held, Lori L., Resnick, Susan M., Davatzikos, Christos, Prince, Jerry L., Landman, Bennett A.

arXiv.org Artificial Intelligence

Defacing is often applied to head magnetic resonance image (MRI) datasets prior to public release to address privacy concerns. The alteration of facial and nearby voxels has provoked discussions about the true capability of these techniques to ensure privacy as well as their impact on downstream tasks. With advancements in deep generative models, the extent to which defacing can protect privacy is uncertain. Additionally, while the altered voxels are known to contain valuable anatomical information, their potential to support research beyond the anatomical regions directly affected by defacing remains uncertain. To evaluate these considerations, we develop a refacing pipeline that recovers faces in defaced head MRIs using cascaded diffusion probabilistic models (DPMs). The DPMs are trained on images from 180 subjects and tested on images from 484 unseen subjects, 469 of whom are from a different dataset. To assess whether the altered voxels in defacing contain universally useful information, we also predict computed tomography (CT)-derived skeletal muscle radiodensity from facial voxels in both defaced and original MRIs. The results show that DPMs can generate high-fidelity faces that resemble the original faces from defaced images, with surface distances to the original faces significantly smaller than those of a population average face (p < 0.05). This performance also generalizes well to previously unseen datasets. For skeletal muscle radiodensity predictions, using defaced images results in significantly weaker Spearman's rank correlation coefficients compared to using original images (p < 10-4). For shin muscle, the correlation is statistically significant (p < 0.05) when using original images but not statistically significant (p > 0.05) when any defacing method is applied, suggesting that defacing might not only fail to protect privacy but also eliminate valuable information.


EnchantedClothes: Visual and Tactile Feedback with an Abdomen-Attached Robot through Clothes

Yamamoto, Takumi, Yoshimura, Rin, Sugiura, Yuta

arXiv.org Artificial Intelligence

--- Wearable robots are designed to be worn on the human body. Taking advantage of their physical form, various applications for wearable robots are being considered. This study proposes a wearable robot worn on the abdomen and a new interaction with it. Our robot enables a variety of applications related to communication between the wearer and surrounding humans through visual and tactile feedback. The contributions of this research will be (1) the proposal of a novel wearable robot worn on the abdomen and (2) a new interaction with it.


Prompt Injection Attacks on Large Language Models in Oncology

Clusmann, Jan, Ferber, Dyke, Wiest, Isabella C., Schneider, Carolin V., Brinker, Titus J., Foersch, Sebastian, Truhn, Daniel, Kather, Jakob N.

arXiv.org Artificial Intelligence

Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways, including as image interpreters, virtual scribes, and general decision support systems. However, here, we demonstrate that current VLMs applied to medical tasks exhibit a fundamental security flaw: they can be attacked by prompt injection attacks, which can be used to output harmful information just by interacting with the VLM, without any access to its parameters. We performed a quantitative study to evaluate the vulnerabilities to these attacks in four state of the art VLMs which have been proposed to be of utility in healthcare: Claude 3 Opus, Claude 3.5 Sonnet, Reka Core, and GPT-4o. Using a set of N=297 attacks, we show that all of these models are susceptible. Specifically, we show that embedding sub-visual prompts in medical imaging data can cause the model to provide harmful output, and that these prompts are non-obvious to human observers. Thus, our study demonstrates a key vulnerability in medical VLMs which should be mitigated before widespread clinical adoption.


Automated classification of multi-parametric body MRI series

Kim, Boah, Mathai, Tejas Sudharshan, Helm, Kimberly, Summers, Ronald M.

arXiv.org Artificial Intelligence

Multi-parametric MRI (mpMRI) studies are widely available in clinical practice for the diagnosis of various diseases. As the volume of mpMRI exams increases yearly, there are concomitant inaccuracies that exist within the DICOM header fields of these exams. This precludes the use of the header information for the arrangement of the different series as part of the radiologist's hanging protocol, and clinician oversight is needed for correction. In this pilot work, we propose an automated framework to classify the type of 8 different series in mpMRI studies. We used 1,363 studies acquired by three Siemens scanners to train a DenseNet-121 model with 5-fold cross-validation. Then, we evaluated the performance of the DenseNet-121 ensemble on a held-out test set of 313 mpMRI studies. Our method achieved an average precision of 96.6%, sensitivity of 96.6%, specificity of 99.6%, and F1 score of 96.6% for the MRI series classification task. To the best of our knowledge, we are the first to develop a method to classify the series type in mpMRI studies acquired at the level of the chest, abdomen, and pelvis. Our method has the capability for robust automation of hanging protocols in modern radiology practice.


Dense 3D Reconstruction Through Lidar: A Comparative Study on Ex-vivo Porcine Tissue

Caccianiga, Guido, Nubert, Julian, Hutter, Marco, Kuchenbecker, Katherine J.

arXiv.org Artificial Intelligence

New sensing technologies and more advanced processing algorithms are transforming computer-integrated surgery. While researchers are actively investigating depth sensing and 3D reconstruction for vision-based surgical assistance, it remains difficult to achieve real-time, accurate, and robust 3D representations of the abdominal cavity for minimally invasive surgery. Thus, this work uses quantitative testing on fresh ex-vivo porcine tissue to thoroughly characterize the quality with which a 3D laser-based time-of-flight sensor (lidar) can perform anatomical surface reconstruction. Ground-truth surface shapes are captured with a commercial laser scanner, and the resulting signed error fields are analyzed using rigorous statistical tools. When compared to modern learning-based stereo matching from endoscopic images, time-of-flight sensing demonstrates higher precision, lower processing delay, higher frame rate, and superior robustness against sensor distance and poor illumination. Furthermore, we report on the potential negative effect of near-infrared light penetration on the accuracy of lidar measurements across different tissue samples, identifying a significant measured depth offset for muscle in contrast to fat and liver. Our findings highlight the potential of lidar for intraoperative 3D perception and point toward new methods that combine complementary time-of-flight and spectral imaging.


Enhancing Medical Task Performance in GPT-4V: A Comprehensive Study on Prompt Engineering Strategies

Chen, Pengcheng, Huang, Ziyan, Deng, Zhongying, Li, Tianbin, Su, Yanzhou, Wang, Haoyu, Ye, Jin, Qiao, Yu, He, Junjun

arXiv.org Artificial Intelligence

OpenAI's latest large vision-language model (LVLM), GPT-4V(ision), has piqued considerable interest for its potential in medical applications. Despite its promise, recent studies and internal reviews highlight its underperformance in specialized medical tasks. This paper explores the boundary of GPT-4V's capabilities in medicine, particularly in processing complex imaging data from endoscopies, CT scans, and MRIs etc. Leveraging open-source datasets, we assessed its foundational competencies, identifying substantial areas for enhancement. Our research emphasizes prompt engineering, an often-underutilized strategy for improving AI responsiveness. Through iterative testing, we refined the model's prompts, significantly improving its interpretative accuracy and relevance in medical imaging. From our comprehensive evaluations, we distilled 10 effective prompt engineering techniques, each fortifying GPT-4V's medical acumen. These methodical enhancements facilitate more reliable, precise, and clinically valuable insights from GPT-4V, advancing its operability in critical healthcare environments. Our findings are pivotal for those employing AI in medicine, providing clear, actionable guidance on harnessing GPT-4V's full diagnostic potential.


FPUS23: An Ultrasound Fetus Phantom Dataset with Deep Neural Network Evaluations for Fetus Orientations, Fetal Planes, and Anatomical Features

Prabakaran, Bharath Srinivas, Hamelmann, Paul, Ostrowski, Erik, Shafique, Muhammad

arXiv.org Artificial Intelligence

Ultrasound imaging is one of the most prominent technologies to evaluate the growth, progression, and overall health of a fetus during its gestation. However, the interpretation of the data obtained from such studies is best left to expert physicians and technicians who are trained and well-versed in analyzing such images. To improve the clinical workflow and potentially develop an at-home ultrasound-based fetal monitoring platform, we present a novel fetus phantom ultrasound dataset, FPUS23, which can be used to identify (1) the correct diagnostic planes for estimating fetal biometric values, (2) fetus orientation, (3) their anatomical features, and (4) bounding boxes of the fetus phantom anatomies at 23 weeks gestation. The entire dataset is composed of 15,728 images, which are used to train four different Deep Neural Network models, built upon a ResNet34 backbone, for detecting aforementioned fetus features and use-cases. We have also evaluated the models trained using our FPUS23 dataset, to show that the information learned by these models can be used to substantially increase the accuracy on real-world ultrasound fetus datasets. We make the FPUS23 dataset and the pre-trained models publicly accessible at https://github.com/bharathprabakaran/FPUS23, which will further facilitate future research on fetal ultrasound imaging and analysis.


The Daring Robot Surgery That Saved a Man's Life

WIRED

IN EARLY APRIL 2020, shortly after the British prime minister Boris Johnson had announced the first pandemic lockdown in the United Kingdom, a urologist named Archie Fernando reached out to one of her colleagues, Nadine Hachach-Haram. The two doctors worked at Guy's and St Thomas' hospital, one of the busiest in the country, at a time when nearly a thousand people were dying of Covid-19 every week. Most surgeries were being deferred, except for life-or-limb cases and urgent cancer surgeries, and Hachach-Haram, who is a reconstructive plastic surgeon, recalls how useless she felt. "I would just walk into the wards and ask the nurses what I could do to help," she says. "I started doing everything, like portering and proning, turning patients over to make their breathing slightly better."