medical image
The supplementary materials includes a detailed description of implementation details for experiments
We use BLIP-2 models built on the FLAN-T5 language model family. We use the same padding side as the FLAN-T5 models. We use a batch size of 8 for all datasets and models. The Q-former is kept in full precision. To produce decompositions, we use multinomial beam search sampling with 5 beams and a top-p of 0.95.
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching
Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained networks on ImageNet and vision-language foundation models trained on web-scale data are the prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and medical images. To bridge this gap, we introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets, covering a large number of organs and modalities such as CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art self-supervised algorithms on this dataset and propose a novel self-supervised contrastive learning algorithm using a graph-matching formulation. The proposed approach makes three contributions: (i) it integrates prior pair-wise image similarity metrics based on local and global information; (ii) it captures the structural constraints of feature embeddings through a loss function constructed through a combinatorial graph-matching objective, and (iii) it can be trained efficiently end-to-end using modern gradient-estimation techniques for black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream medical tasks ranging from segmentation and classification to object detection, and both for the in and out-of-distribution settings. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models. For challenging tasks such as Brain Tumor Classification or Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models trained on 1 billion masks by 6-7% while using only a ResNet-50.
- Health & Medicine > Diagnostic Medicine > Imaging (0.83)
- Health & Medicine > Therapeutic Area (0.58)
Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback
Generative models capable of precisely capturing nuanced clinical features in medical images hold great promise for facilitating clinical data sharing, enhancing rare disease datasets, and efficiently synthesizing (annotated) medical images at scale. Despite their potential, assessing the quality of synthetic medical images remains a challenge. While modern generative models can synthesize visually-realistic medical images, the clinical plausibility of these images may be called into question. Domain-agnostic scores, such as FID score, precision, and recall, cannot incorporate clinical knowledge and are, therefore, not suitable for assessing clinical sensibility. Additionally, there are numerous unpredictable ways in which generative models may fail to synthesize clinically plausible images, making it challenging to anticipate potential failures and design automated scores for their detection.
Exploring Adversarial Watermarking in Transformer-Based Models: Transferability and Robustness Against Defense Mechanism for Medical Images
Sadik, Rifat, Rahman, Tanvir, Bhattacharjee, Arpan, Halder, Bikash Chandra, Hossain, Ismail, Aoyon, Rifat Sarker, Alam, Md. Golam Rabiul, Uddin, Jia
Deep learning models have shown remarkable success in dermatological image analysis, offering potential for automated skin disease diagnosis. Previously, convolutional neural network(CNN) based architectures have achieved immense popularity and success in computer vision (CV) based task like skin image recognition, generation and video analysis. But with the emergence of transformer based models, CV tasks are now are nowadays carrying out using these models. Vision Transformers (ViTs) is such a transformer-based models that have shown success in computer vision. It uses self-attention mechanisms to achieve state-of-the-art performance across various tasks. However, their reliance on global attention mechanisms makes them susceptible to adversarial perturbations. This paper aims to investigate the susceptibility of ViTs for medical images to adversarial watermarking-a method that adds so-called imperceptible perturbations in order to fool models. By generating adversarial watermarks through Projected Gradient Descent (PGD), we examine the transferability of such attacks to CNNs and analyze the performance defense mechanism -- adversarial training. Results indicate that while performance is not compromised for clean images, ViTs certainly become much more vulnerable to adversarial attacks: an accuracy drop of as low as 27.6%. Nevertheless, adversarial training raises it up to 90.0%.
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Delaware > New Castle County > Newark (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Dermatology (1.00)
Provenance-Driven Reliable Semantic Medical Image Vector Reconstruction via Lightweight Blockchain-Verified Latent Fingerprints
Rasheed, Mohsin, Al-Mamun, Abdullah
Medical imaging is essential for clinical diagnosis, yet real-world data frequently suffers from corruption, noise, and potential tampering, challenging the reliability of AI-assisted interpretation. Conventional reconstruction techniques prioritize pixel-level recovery and may produce visually plausible outputs while compromising anatomical fidelity, an issue that can directly impact clinical outcomes. We propose a semantic-aware medical image reconstruction framework that integrates high-level latent embeddings with a hybrid U-Net architecture to preserve clinically relevant structures during restoration. To ensure trust and accountability, we incorporate a lightweight blockchain-based provenance layer using scale-free graph design, enabling verifiable recording of each reconstruction event without imposing significant overhead. Extensive evaluation across multiple datasets and corruption types demonstrates improved structural consistency, restoration accuracy, and provenance integrity compared with existing approaches. By uniting semantic-guided reconstruction with secure traceability, our solution advances dependable AI for medical imaging, enhancing both diagnostic confidence and regulatory compliance in healthcare environments.
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- North America > Mexico > Gulf of Mexico (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.34)
- Asia > China (0.04)
- Oceania > Australia (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Rethinking Bias in Generative Data Augmentation for Medical AI: a Frequency Recalibration Method
Liu, Chi, Liu, Jincheng, Zhu, Congcong, Wang, Minghao, Shen, Sheng, Gu, Jia, Zhu, Tianqing, Zhou, Wanlei
Developing Medical AI relies on large datasets and easily suffers from data scarcity. Generative data augmentation (GDA) using AI generative models offers a solution to synthesize realistic medical images. However, the bias in GDA is often underestimated in medical domains, with concerns about the risk of introducing detrimental features generated by AI and harming downstream tasks. This paper identifies the frequency misalignment between real and synthesized images as one of the key factors underlying unreliable GDA and proposes the Frequency Recalibration (FreRec) method to reduce the frequency distributional discrepancy and thus improve GDA. FreRec involves (1) Statistical High-frequency Replacement (SHR) to roughly align high-frequency components and (2) Reconstructive High-frequency Mapping (RHM) to enhance image quality and reconstruct high-frequency details. Extensive experiments were conducted in various medical datasets, including brain MRIs, chest X-rays, and fundus images. The results show that FreRec significantly improves downstream medical image classification performance compared to uncalibrated AI-synthesized samples. FreRec is a standalone post-processing step that is compatible with any generative model and can integrate seamlessly with common medical GDA pipelines.
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area (0.95)
- Information Technology > Security & Privacy (0.88)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.95)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Virginia (0.04)
- Europe > Ireland > Munster (0.04)
- Asia > Singapore (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)