Goto

Collaborating Authors

 pneumothorax




d61e9e58ae1058322bc169943b39f1d8-Paper.pdf

Neural Information Processing Systems

Setprediction tasksrequire thematching between predicted setandground truth set in order to propagate the gradient signal. Recent works have performed this matching in the original feature space thus requiring predefined distance functions.


Closing the Performance Gap Between AI and Radiologists in Chest X-Ray Reporting

Sharma, Harshita, Reynolds, Maxwell C., Salvatelli, Valentina, Sykes, Anne-Marie G., Horst, Kelly K., Schwaighofer, Anton, Ilse, Maximilian, Melnichenko, Olesya, Bond-Taylor, Sam, Pérez-García, Fernando, Mugu, Vamshi K., Chan, Alex, Colak, Ceylan, Swartz, Shelby A., Nashawaty, Motassem B., Gonzalez, Austin J., Ouellette, Heather A., Erdal, Selnur B., Schueler, Beth A., Wetscherek, Maria T., Codella, Noel, Jain, Mohit, Bannur, Shruthi, Bouzid, Kenza, Castro, Daniel C., Hyland, Stephanie, Korfiatis, Panos, Khandelwal, Ashish, Alvarez-Valle, Javier

arXiv.org Artificial Intelligence

AI-assisted report generation offers the opportunity to reduce radiologists' workload stemming from expanded screening guidelines, complex cases and workforce shortages, while maintaining diagnostic accuracy. In addition to describing pathological findings in chest X-ray reports, interpreting lines and tubes (L&T) is demanding and repetitive for radiologists, especially with high patient volumes. We introduce MAIRA-X, a clinically evaluated multimodal AI model for longitudinal chest X-ray (CXR) report generation, that encompasses both clinical findings and L&T reporting. Developed using a large-scale, multi-site, longitudinal dataset of 3.1 million studies (comprising 6 million images from 806k patients) from Mayo Clinic, MAIRA-X was evaluated on three holdout datasets and the public MIMIC-CXR dataset, where it significantly improved AI-generated reports over the state of the art on lexical quality, clinical correctness, and L&T-related elements. A novel L&T-specific metrics framework was developed to assess accuracy in reporting attributes such as type, longitudinal change and placement. A first-of-its-kind retrospective user evaluation study was conducted with nine radiologists of varying experience, who blindly reviewed 600 studies from distinct subjects. The user study found comparable rates of critical errors (3.0% for original vs. 4.6% for AI-generated reports) and a similar rate of acceptable sentences (97.8% for original vs. 97.4% for AI-generated reports), marking a significant improvement over prior user studies with larger gaps and higher error rates. Our results suggest that MAIRA-X can effectively assist radiologists, particularly in high-volume clinical settings.



Feature Quality and Adaptability of Medical Foundation Models: A Comparative Evaluation for Radiographic Classification and Segmentation

Li, Frank, Dapamede, Theo, Chavoshi, Mohammadreza, Jeon, Young Seok, Khosravi, Bardia, Dere, Abdulhameed, Brown-Mulry, Beatrice, Isaac, Rohan Satya, Mansuri, Aawez, Sanyika, Chiratidzo, Newsome, Janice, Purkayastha, Saptarshi, Banerjee, Imon, Trivedi, Hari, Gichoya, Judy

arXiv.org Artificial Intelligence

Foundation models (FMs) promise to generalize medical imaging, but their effectiveness varies. It remains unclear how pre-training domain (medical vs. general), paradigm (e.g., text-guided), and architecture influence embedding quality, hindering the selection of optimal encoders for specific radiology tasks. To address this, we evaluate vision encoders from eight medical and general-domain FMs for chest X-ray analysis. We benchmark classification (pneumothorax, cardiomegaly) and segmentation (pneumothorax, cardiac boundary) using linear probing and fine-tuning. Our results show that domain-specific pre-training provides a significant advantage; medical FMs consistently outperformed general-domain models in linear probing, establishing superior initial feature quality. However, feature utility is highly task-dependent. Pre-trained embeddings were strong for global classification and segmenting salient anatomy (e.g., heart). In contrast, for segmenting complex, subtle pathologies (e.g., pneumothorax), all FMs performed poorly without significant fine-tuning, revealing a critical gap in localizing subtle disease. Subgroup analysis showed FMs use confounding shortcuts (e.g., chest tubes for pneumothorax) for classification, a strategy that fails for precise segmentation. We also found that expensive text-image alignment is not a prerequisite; image-only (RAD-DINO) and label-supervised (Ark+) FMs were among top performers. Notably, a supervised, end-to-end baseline remained highly competitive, matching or exceeding the best FMs on segmentation tasks. These findings show that while medical pre-training is beneficial, architectural choices (e.g., multi-scale) are critical, and pre-trained features are not universally effective, especially for complex localization tasks where supervised models remain a strong alternative.


Chest X-ray Pneumothorax Segmentation Using EfficientNet-B4 Transfer Learning in a U-Net Architecture

Roque, Alvaro Aranibar, Sebastian, Helga

arXiv.org Artificial Intelligence

Ab s tract -- Pneumothorax, the abnormal accumulation of air in the pleural space, can be life - threatening if undetected. Chest X - rays are the first - line diagnostic tool, but small cases may be subtle. We propose an automated deep - learning pipeline using a U - Net with an EfficientNet - B4 encoder to segment pneumothorax regions. Trained on the SIIM - ACR dataset with data augmentation and a combined binary cross - entropy plus Dice loss, the model achieved an IoU of 0.7008 a nd Dice score of 0.8241 on the independent PTX - 498 dataset. These results demonstrate that the model can accurately localize pneumothoraces and support radiologists . Pneumothorax is the abnormal accumulation of air in the pleural space, which can arise spontaneously or due to trauma or medical procedures. Early detection is critical, as even small pneumothoraces may rapidly progress to life - threatening conditions. Clin ical examination alone may miss subtle cases [1], making chest X - rays the standard diagnostic tool.



Insights into a radiology-specialised multimodal large language model with sparse autoencoders

Bouzid, Kenza, Bannur, Shruthi, Meissen, Felix, de Castro, Daniel Coelho, Schwaighofer, Anton, Alvarez-Valle, Javier, Hyland, Stephanie L.

arXiv.org Artificial Intelligence

Interpretability can improve the safety, transparency and trust of AI models, which is especially important in healthcare applications where decisions often carry significant consequences. Mechanistic interpretability, particularly through the use of sparse autoencoders (SAEs), offers a promising approach for uncovering human-interpretable features within large transformer-based models. In this study, we apply Matryoshka-SAE to the radiology-specialised multimodal large language model, MAIRA-2, to interpret its internal representations. Using large-scale automated interpretability of the SAE features, we identify a range of clinically relevant concepts - including medical devices (e.g., line and tube placements, pacemaker presence), pathologies such as pleural effusion and cardiomegaly, longitudinal changes and textual features. We further examine the influence of these features on model behaviour through steering, demonstrating directional control over generations with mixed success. Our results reveal practical and methodological challenges, yet they offer initial insights into the internal concepts learned by MAIRA-2 - marking a step toward deeper mechanistic understanding and interpretability of a radiology-adapted multimodal large language model, and paving the way for improved model transparency. We release the trained SAEs and interpretations: https://huggingface.co/microsoft/maira-2-sae.


LLaMA-XR: A Novel Framework for Radiology Report Generation using LLaMA and QLoRA Fine Tuning

Jahangir, Md. Zihad Bin, Kabir, Muhammad Ashad, Akter, Sumaiya, Jahan, Israt, Chau, Minh

arXiv.org Artificial Intelligence

Automated radiology report generation holds significant potential to reduce radiologists' workload and enhance diagnostic accuracy. However, generating precise and clinically meaningful reports from chest radiographs remains challenging due to the complexity of medical language and the need for contextual understanding. Existing models often struggle with maintaining both accuracy and contextual relevance. In this paper, we present LLaMA-XR, a novel framework that integrates LLaMA 3.1 with DenseNet-121-based image embeddings and Quantized Low-Rank Adaptation (QLoRA) fine-tuning. LLaMA-XR achieves improved coherence and clinical accuracy while maintaining computational efficiency. This efficiency is driven by an optimization strategy that enhances parameter utilization and reduces memory overhead, enabling faster report generation with lower computational resource demands. Extensive experiments conducted on the IU X-ray benchmark dataset demonstrate that LLaMA-XR outperforms a range of state-of-the-art methods. Our model achieves a ROUGE-L score of 0.433 and a METEOR score of 0.336, establishing new performance benchmarks in the domain. These results underscore LLaMA-XR's potential as an effective and efficient AI system for automated radiology reporting, offering enhanced clinical utility and reliability.