Not enough data to create a plot.
Try a different view from the menu above.
Mehta, Deval
Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification
Kumar, Raja, Singhal, Raghav, Kulkarni, Pranamya, Mehta, Deval, Jadhav, Kshitij
Deep multimodal learning has shown remarkable success by leveraging contrastive learning to capture explicit one-to-one relations across modalities. However, real-world data often exhibits shared relations beyond simple pairwise associations. We propose M3CoL, a Multimodal Mixup Contrastive Learning approach to capture nuanced shared relations inherent in multimodal data. Our key contribution is a Mixup-based contrastive loss that learns robust representations by aligning mixed samples from one modality with their corresponding samples from other modalities thereby capturing shared relations between them. For multimodal classification tasks, we introduce a framework that integrates a fusion module with unimodal prediction modules for auxiliary supervision during training, complemented by our proposed Mixup-based contrastive loss. Through extensive experiments on diverse datasets (N24News, ROSMAP, BRCA, and Food-101), we demonstrate that M3CoL effectively captures shared multimodal relations and generalizes across domains. It outperforms state-of-the-art methods on N24News, ROSMAP, and BRCA, while achieving comparable performance on Food-101. Our work highlights the significance of learning shared relations for robust multimodal learning, opening up promising avenues for future research. Our code is publicly available at https://github.com/RaghavSinghal10/M3CoL.
One Shot GANs for Long Tail Problem in Skin Lesion Dataset using novel content space assessment metric
Deo, Kunal, Mehta, Deval, Jadhav, Kshitij
Long tail problems frequently arise in the medical field, particularly due to the scarcity of medical data for rare conditions. This scarcity often leads to models overfitting on such limited samples. Consequently, when training models on datasets with heavily skewed classes, where the number of samples varies significantly-- a problem emerges. Training on such imbalanced datasets can result in selective detection, where a model accurately identifies images belonging to the majority classes but disregards those from minority classes. This causes the model to lack generalizability, preventing its use on newer data. This poses a significant challenge in developing image detection and diagnosis models for medical image datasets. To address this challenge, the One Shot GANs model was employed to augment the tail class of HAM10000 dataset by generating additional samples. Furthermore, to enhance accuracy, a novel metric tailored to suit One Shot GANs was utilized.
Revamping AI Models in Dermatology: Overcoming Critical Challenges for Enhanced Skin Lesion Diagnosis
Mehta, Deval, Betz-Stablein, Brigid, Nguyen, Toan D, Gal, Yaniv, Bowling, Adrian, Haskett, Martin, Sashindranath, Maithili, Bonnington, Paul, Mar, Victoria, Soyer, H Peter, Ge, Zongyuan
The surge in developing deep learning models for diagnosing skin lesions through image analysis is notable, yet their clinical black faces challenges. Current dermatology AI models have limitations: limited number of possible diagnostic outputs, lack of real-world testing on uncommon skin lesions, inability to detect out-of-distribution images, and over-reliance on dermoscopic images. To address these, we present an All-In-One \textbf{H}ierarchical-\textbf{O}ut of Distribution-\textbf{C}linical Triage (HOT) model. For a clinical image, our model generates three outputs: a hierarchical prediction, an alert for out-of-distribution images, and a recommendation for dermoscopy if clinical image alone is insufficient for diagnosis. When the recommendation is pursued, it integrates both clinical and dermoscopic images to deliver final diagnosis. Extensive experiments on a representative cutaneous lesion dataset demonstrate the effectiveness and synergy of each component within our framework. Our versatile model provides valuable decision support for lesion diagnosis and sets a promising precedent for medical AI applications.
Privacy-preserving Early Detection of Epileptic Seizures in Videos
Mehta, Deval, Sivathamboo, Shobi, Simpson, Hugh, Kwan, Patrick, O`Brien, Terence, Ge, Zongyuan
In this work, we contribute towards the development of video-based epileptic seizure classification by introducing a novel framework (SETR-PKD), which could achieve privacy-preserved early detection of seizures in videos. Specifically, our framework has two significant components - (1) It is built upon optical flow features extracted from the video of a seizure, which encodes the seizure motion semiotics while preserving the privacy of the patient; (2) It utilizes a transformer based progressive knowledge distillation, where the knowledge is gradually distilled from networks trained on a longer portion of video samples to the ones which will operate on shorter portions. Thus, our proposed framework addresses the limitations of the current approaches which compromise the privacy of the patients by directly operating on the RGB video of a seizure as well as impede real-time detection of a seizure by utilizing the full video sample to make a prediction. Our SETR-PKD framework could detect tonic-clonic seizures (TCSs) in a privacy-preserving manner with an accuracy of 83.9% while they are only half-way into their progression. Our data and code is available at https://github.com/DevD1092/seizure-detection
Leukocyte Classification using Multimodal Architecture Enhanced by Knowledge Distillation
Yang, Litao, Mehta, Deval, Mahapatra, Dwarikanath, Ge, Zongyuan
Recently, a lot of automated white blood cells (WBC) or leukocyte classification techniques have been developed. However, all of these methods only utilize a single modality microscopic image i.e. either blood smear or fluorescence based, thus missing the potential of a better learning from multimodal images. In this work, we develop an efficient multimodal architecture based on a first of its kind multimodal WBC dataset for the task of WBC classification. Specifically, our proposed idea is developed in two steps - 1) First, we learn modality specific independent subnetworks inside a single network only; 2) We further enhance the learning capability of the independent subnetworks by distilling knowledge from high complexity independent teacher networks. With this, our proposed framework can achieve a high performance while maintaining low complexity for a multimodal dataset. Our unique contribution is two-fold - 1) We present a first of its kind multimodal WBC dataset for WBC classification; 2) We develop a high performing multimodal architecture which is also efficient and low in complexity at the same time.
Towards Automated and Marker-less Parkinson Disease Assessment: Predicting UPDRS Scores using Sit-stand videos
Mehta, Deval, Asif, Umar, Hao, Tian, Bilal, Erhan, Von Cavallar, Stefan, Harrer, Stefan, Rogers, Jeffrey
This paper presents a novel deep learning enabled, video based analysis framework for assessing the Unified Parkinsons Disease Rating Scale (UPDRS) that can be used in the clinic or at home. We report results from comparing the performance of the framework to that of trained clinicians on a population of 32 Parkinsons disease (PD) patients. In-person clinical assessments by trained neurologists are used as the ground truth for training our framework and for comparing the performance. We find that the standard sit-to-stand activity can be used to evaluate the UPDRS sub-scores of bradykinesia (BRADY) and posture instability and gait disorders (PIGD). For BRADY we find F1-scores of 0.75 using our framework compared to 0.50 for the video based rater clinicians, while for PIGD we find 0.78 for the framework and 0.45 for the video based rater clinicians. We believe our proposed framework has potential to provide clinically acceptable end points of PD in greater granularity without imposing burdens on patients and clinicians, which empowers a variety of use cases such as passive tracking of PD progression in spaces such as nursing homes, in-home self-assessment, and enhanced tele-medicine.