Goto

Collaborating Authors

 malignancy


Evaluating Generative AI as an Educational Tool for Radiology Resident Report Drafting

Verdone, Antonio, Cardall, Aidan, Siddiqui, Fardeen, Nashawaty, Motaz, Rigau, Danielle, Kwon, Youngjoon, Yousef, Mira, Patel, Shalin, Kieturakis, Alex, Kim, Eric, Heacock, Laura, Reig, Beatriu, Shen, Yiqiu

arXiv.org Artificial Intelligence

Objective: Radiology residents require timely, personalized feedback to develop accurate image analysis and reporting skills. Increasing clinical workload often limits attendings' ability to provide guidance. This study evaluates a HIPAA-compliant GPT-4o system that delivers automated feedback on breast imaging reports drafted by residents in real clinical settings. Methods: We analyzed 5,000 resident-attending report pairs from routine practice at a multi-site U.S. health system. GPT-4o was prompted with clinical instructions to identify common errors and provide feedback. A reader study using 100 report pairs was conducted. Four attending radiologists and four residents independently reviewed each pair, determined whether predefined error types were present, and rated GPT-4o's feedback as helpful or not. Agreement between GPT and readers was assessed using percent match. Inter-reader reliability was measured with Krippendorff's alpha. Educational value was measured as the proportion of cases rated helpful. Results: Three common error types were identified: (1) omission or addition of key findings, (2) incorrect use or omission of technical descriptors, and (3) final assessment inconsistent with findings. GPT-4o showed strong agreement with attending consensus: 90.5%, 78.3%, and 90.4% across error types. Inter-reader reliability showed moderate variability (α = 0.767, 0.595, 0.567), and replacing a human reader with GPT-4o did not significantly affect agreement (Δ = -0.004 to 0.002). GPT's feedback was rated helpful in most cases: 89.8%, 83.0%, and 92.0%. Discussion: ChatGPT-4o can reliably identify key educational errors. It may serve as a scalable tool to support radiology education.


Mediator-Guided Multi-Agent Collaboration among Open-Source Models for Medical Decision-Making

Chen, Kaitao, Liu, Mianxin, Zong, Daoming, Ding, Chaoyue, Rui, Shaohao, Jiang, Yankai, Zhou, Mu, Wang, Xiaosong

arXiv.org Artificial Intelligence

Complex medical decision-making involves cooperative workflows operated by different clinicians. Designing AI multi-agent systems can expedite and augment human-level clinical decision-making. Existing multi-agent researches primarily focus on language-only tasks, yet their extension to multimodal scenarios remains challenging. A blind combination of diverse vision-language models (VLMs) can amplify an erroneous outcome interpretation. VLMs in general are less capable in instruction following and importantly self-reflection, compared to large language models (LLMs) of comparable sizes. This disparity largely constrains VLMs' ability in cooperative workflows. In this study, we propose MedOrch, a mediator-guided multi-agent collaboration framework for medical multimodal decision-making. MedOrch employs an LLM-based mediator agent that enables multiple VLM-based expert agents to exchange and reflect on their outputs towards collaboration. We utilize multiple open-source general-purpose and domain-specific VLMs instead of costly GPT-series models, revealing the strength of heterogeneous models. We show that the collaboration within distinct VLM-based agents can surpass the capabilities of any individual agent. We validate our approach on five medical vision question answering benchmarks, demonstrating superior collaboration performance without model training. Our findings underscore the value of mediator-guided multi-agent collaboration in advancing medical multimodal intelligence.


Interpretable Data Mining of Follicular Thyroid Cancer Ultrasound Features Using Enhanced Association Rules

Zhou, Songlin, Zhou, Tao, Li, Xin, Yau, Stephen Shing-Toung

arXiv.org Artificial Intelligence

Purpose: Thyroid cancer has been a common cancer. Papillary thyroid cancer and follicular thyroid cancer are the two most common types of thyroid cancer. Follicular thyroid cancer lacks distinctive ultrasound signs and is more difficult to diagnose preoperatively than the more prevalent papillary thyroid cancer, and the clinical studies associated with it are less well established. We aimed to analyze the clinical data of follicular thyroid cancer based on a novel data mining tool to identify some clinical indications that may help in preoperative diagnosis. Methods: We performed a retrospective analysis based on case data collected by the Department of General Surgery of Peking University Third Hospital between 2010 and 2023. Unlike traditional statistical methods, we improved the association rule mining, a classical data mining method, and proposed new analytical metrics reflecting the malignant association between clinical indications and cancer with the help of the idea of SHAP method in interpretable machine learning. Results: The dataset was preprocessed to contain 1673 cases (in terms of nodes rather than patients), of which 1414 were benign and 259 were malignant nodes. Our analysis pointed out that in addition to some common indicators (e.g., irregular or lobulated nodal margins, uneven thickness halo, hypoechogenicity), there were also some indicators with strong malignant associations, such as nodule-in-nodule pattern, trabecular pattern, and low TSH scores. In addition, our results suggest that the combination of Hashimoto's thyroiditis may also have a strong malignant association. Conclusion: In the preoperative diagnosis of nodules suspected of follicular thyroid cancer, multiple clinical indications should be considered for a more accurate diagnosis. The diverse malignant associations identified in our study may serve as a reference for clinicians in related fields.


What Can We Learn from Inter-Annotator Variability in Skin Lesion Segmentation?

Abhishek, Kumar, Kawahara, Jeremy, Hamarneh, Ghassan

arXiv.org Artificial Intelligence

Medical image segmentation exhibits intra-and inter-annotator variability due to ambiguous object boundaries, annotator preferences, expertise, and tools, among other factors. Lesions with ambiguous boundaries, e.g., spiculated or infiltrative nodules, or irregular borders per the ABCD rule, are particularly prone to disagreement and are often associated with malignancy. In this work, we curate IMA++, the largest multi-annotator skin lesion segmentation dataset, on which we conduct an in-depth study of variability due to annotator, malignancy, tool, and skill factors. We find a statistically significant ( p <0.001) association between inter-annotator agreement (IAA), measured using Dice, and the malignancy of skin lesions. We further show that IAA can be accurately predicted directly from dermoscopic images, achieving a mean absolute error of 0.108. Finally, we leverage this association by utilizing IAA as a "soft" clinical feature within a multi-task learning objective, yielding a 4.2% improvement in balanced accuracy averaged across multiple model architectures and across IMA++ and four public dermoscopic datasets.


The Problem With Early Cancer Detection

The New Yorker

The discovery began, as many breakthroughs do, with an observation that didn't quite make sense. In 1948, two French researchers, Paul Mandel and Pierre Métais, published a little-noticed paper in a scientific journal. Working in a laboratory in Strasbourg, they had been cataloguing the chemical contents of blood plasma--that river of life teeming with proteins, sugars, waste, nutrients, and cellular debris. Amid this familiar inventory, they'd spotted an unexpected presence: fragments of DNA drifting freely. The finding defied biological orthodoxy. DNA was thought to remain locked inside the nuclei of cells, and not float around on its own.


BI-RADS prediction of mammographic masses using uncertainty information extracted from a Bayesian Deep Learning model

Chegini, Mohaddeseh, Mahloojifar, Ali

arXiv.org Artificial Intelligence

The BI_RADS score is a probabilistic reporting tool used by radiologists to express the level of uncertainty in predicting breast cancer based on some morphological features in mammography images. There is a significant variability in describing masses which sometimes leads to BI_RADS misclassification. Using a BI_RADS prediction system is required to support the final radiologist decisions. In this study, the uncertainty information extracted by a Bayesian deep learning model is utilized to predict the BI_RADS score. The investigation results based on the pathology information demonstrate that the f1-scores of the predictions of the radiologist are 42.86%, 48.33% and 48.28%, meanwhile, the f1-scores of the model performance are 73.33%, 59.60% and 59.26% in the BI_RADS 2, 3 and 5 dataset samples, respectively. Also, the model can distinguish malignant from benign samples in the BI_RADS 0 category of the used dataset with an accuracy of 75.86% and correctly identify all malignant samples as BI_RADS 5. The Grad-CAM visualization shows the model pays attention to the morphological features of the lesions. Therefore, this study shows the uncertainty-aware Bayesian Deep Learning model can report his uncertainty about the malignancy of a lesion based on morphological features, like a radiologist.


PersonalizedUS: Interpretable Breast Cancer Risk Assessment with Local Coverage Uncertainty Quantification

Fröhlich, Alek, Ramos, Thiago, Cabello, Gustavo, Buzatto, Isabela, Izbicki, Rafael, Tiezzi, Daniel

arXiv.org Machine Learning

Correctly assessing the malignancy of breast lesions identified during ultrasound examinations is crucial for effective clinical decision-making. However, the current "golden standard" relies on manual BI-RADS scoring by clinicians, often leading to unnecessary biopsies and a significant mental health burden on patients and their families. In this paper, we introduce PersonalizedUS, an interpretable machine learning system that leverages recent advances in conformal prediction to provide precise and personalized risk estimates with local coverage guarantees and sensitivity, specificity, and predictive values above 0.9 across various threshold levels. In particular, we identify meaningful lesion subgroups where distribution-free, model-agnostic conditional coverage holds, with approximately 90% of our prediction sets containing only the ground truth in most lesion subgroups, thus explicitly characterizing for which patients the model is most suitably applied. Moreover, we make available a curated tabular dataset of 1936 biopsied breast lesions from a recent observational multicenter study and benchmark the performance of several state-of-the-art learning algorithms. We also report a successful case study of the deployed system in the same multicenter context. Concrete clinical benefits include up to a 65% reduction in requested biopsies among BI-RADS 4a and 4b lesions, with minimal to no missed cancer cases.


Unsupervised Discovery of Clinical Disease Signatures Using Probabilistic Independence

Lasko, Thomas A., Still, John M., Li, Thomas Z., Mota, Marco Barbero, Stead, William W., Strobl, Eric V., Landman, Bennett A., Maldonado, Fabien

arXiv.org Artificial Intelligence

Insufficiently precise diagnosis of clinical disease is likely responsible for many treatment failures, even for common conditions and treatments. With a large enough dataset, it may be possible to use unsupervised machine learning to define clinical disease patterns more precisely. We present an approach to learning these patterns by using probabilistic independence to disentangle the imprint on the medical record of causal latent sources of disease. We inferred a broad set of 2000 clinical signatures of latent sources from 9195 variables in 269,099 Electronic Health Records. The learned signatures produced better discrimination than the original variables in a lung cancer prediction task unknown to the inference algorithm, predicting 3-year malignancy in patients with no history of cancer before a solitary lung nodule was discovered. More importantly, the signatures' greater explanatory power identified pre-nodule signatures of apparently undiagnosed cancer in many of those patients.


Breast cancer breakthrough: AI predicts a third of cases prior to diagnosis in mammography study

FOX News

Artificial intelligence could have the capability to pinpoint cancer diagnoses a lot sooner. A new study published in the journal Radiology last week noted that AI helped predict one-third of breast cancer cases up to two years prior to diagnosis. The research surveyed imaging data and screening information from BreastScreen Norway exams performed from January 2004 to December 2019. Women who were later diagnosed with breast cancer based on these exams were given an AI risk score by a "commercially available AI system," according to the study's findings. The scores were ranked 1-7 for low-risk malignancy, 8-9 for intermediate risk and 10 for high-risk malignancy.


Text Classification of Cancer Clinical Trial Eligibility Criteria

Yang, Yumeng, Jayaraj, Soumya, Ludmir, Ethan B, Roberts, Kirk

arXiv.org Artificial Intelligence

Automatic identification of clinical trials for which a patient is eligible is complicated by the fact that trial eligibility is stated in natural language. A potential solution to this problem is to employ text classification methods for common types of eligibility criteria. In this study, we focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness. Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level. We experiment with common transformer models as well as a new pre-trained clinical trial BERT model. Our results demonstrate the feasibility of automatically classifying common exclusion criteria. Additionally, we demonstrate the value of a pre-trained language model specifically for clinical trials, which yields the highest average performance across all criteria.