Jadhav, Kshitij
Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification
Kumar, Raja, Singhal, Raghav, Kulkarni, Pranamya, Mehta, Deval, Jadhav, Kshitij
Deep multimodal learning has shown remarkable success by leveraging contrastive learning to capture explicit one-to-one relations across modalities. However, real-world data often exhibits shared relations beyond simple pairwise associations. We propose M3CoL, a Multimodal Mixup Contrastive Learning approach to capture nuanced shared relations inherent in multimodal data. Our key contribution is a Mixup-based contrastive loss that learns robust representations by aligning mixed samples from one modality with their corresponding samples from other modalities thereby capturing shared relations between them. For multimodal classification tasks, we introduce a framework that integrates a fusion module with unimodal prediction modules for auxiliary supervision during training, complemented by our proposed Mixup-based contrastive loss. Through extensive experiments on diverse datasets (N24News, ROSMAP, BRCA, and Food-101), we demonstrate that M3CoL effectively captures shared multimodal relations and generalizes across domains. It outperforms state-of-the-art methods on N24News, ROSMAP, and BRCA, while achieving comparable performance on Food-101. Our work highlights the significance of learning shared relations for robust multimodal learning, opening up promising avenues for future research. Our code is publicly available at https://github.com/RaghavSinghal10/M3CoL.
MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions
Mankash, Tavish, Kota, V. S. Chaithanya, De, Anish, Prakash, Praveen, Jadhav, Kshitij
Hospitals in India still rely on handwritten medical records despite the availability of Electronic Medical Records (EMR), complicating statistical analysis and record retrieval. Handwritten records pose a unique challenge, requiring specialized data for training models to recognize medications and their recommendation patterns. While traditional handwriting recognition approaches employ 2-D LSTMs, recent studies have explored using Multimodal Large Language Models (MLLMs) for OCR tasks. Building on this approach, we focus on extracting medication names and dosages from simulated medical records. Our methodology MIRAGE (Multimodal Identification and Recognition of Annotations in indian GEneral prescriptions) involves fine-tuning the QWEN VL, LLaVA 1.6 and Idefics2 models on 743,118 high resolution simulated medical record images-fully annotated from 1,133 doctors across India. Our approach achieves 82% accuracy in extracting medication names and dosages.
One Shot GANs for Long Tail Problem in Skin Lesion Dataset using novel content space assessment metric
Deo, Kunal, Mehta, Deval, Jadhav, Kshitij
Long tail problems frequently arise in the medical field, particularly due to the scarcity of medical data for rare conditions. This scarcity often leads to models overfitting on such limited samples. Consequently, when training models on datasets with heavily skewed classes, where the number of samples varies significantly-- a problem emerges. Training on such imbalanced datasets can result in selective detection, where a model accurately identifies images belonging to the majority classes but disregards those from minority classes. This causes the model to lack generalizability, preventing its use on newer data. This poses a significant challenge in developing image detection and diagnosis models for medical image datasets. To address this challenge, the One Shot GANs model was employed to augment the tail class of HAM10000 dataset by generating additional samples. Furthermore, to enhance accuracy, a novel metric tailored to suit One Shot GANs was utilized.
Replace and Report: NLP Assisted Radiology Report Generation
Kale, Kaveri, Bhattacharyya, pushpak, Jadhav, Kshitij
Clinical practice frequently uses medical imaging for diagnosis and treatment. A significant challenge for automatic radiology report generation is that the radiology reports are long narratives consisting of multiple sentences for both abnormal and normal findings. Therefore, applying conventional image captioning approaches to generate the whole report proves to be insufficient, as these are designed to briefly describe images with short sentences. We propose a template-based approach to generate radiology reports from radiographs. Our approach involves the following: i) using a multilabel image classifier, produce the tags for the input radiograph; ii) using a transformer-based model, generate pathological descriptions (a description of abnormal findings seen on radiographs) from the tags generated in step (i); iii) using a BERT-based multi-label text classifier, find the spans in the normal report template to replace with the generated pathological descriptions; and iv) using a rule-based system, replace the identified span with the generated pathological description. We performed experiments with the two most popular radiology report datasets, IU Chest X-ray and MIMIC-CXR and demonstrated that the BLEU-1, ROUGE-L, METEOR, and CIDEr scores are better than the State-of-the-Art models by 25%, 36%, 44% and 48% respectively, on the IU X-RAY dataset. To the best of our knowledge, this is the first attempt to generate chest X-ray radiology reports by first creating small sentences for abnormal findings and then replacing them in the normal report template.