Subedar, Mahesh
Parameter-Efficient Active Learning for Foundational models
Narayanan, Athmanarayanan Lakshmi, Krishnan, Ranganath, Machireddy, Amrutha, Subedar, Mahesh
Foundational vision transformer models have shown impressive few shot performance on many vision tasks. This research presents a novel investigation into the application of parameter efficient fine-tuning methods within an active learning (AL) framework, to advance the sampling selection process in extremely budget constrained classification tasks. The focus on image datasets, known for their out-of-distribution characteristics, adds a layer of complexity and relevance to our study. Through a detailed evaluation, we illustrate the improved AL performance on these challenging datasets, highlighting the strategic advantage of merging parameter efficient fine tuning methods with foundation models. This contributes to the broader discourse on optimizing AL strategies, presenting a promising avenue for future exploration in leveraging foundation models for efficient and effective data annotation in specialized domains.
Improving Robustness and Efficiency in Active Learning with Contrastive Loss
Krishnan, Ranganath, Ahuja, Nilesh, Sinha, Alok, Subedar, Mahesh, Tickoo, Omesh, Iyer, Ravi
This paper introduces supervised contrastive active learning (SCAL) by leveraging the contrastive loss for active learning in a supervised setting. We propose efficient query strategies in active learning to select unbiased and informative data samples of diverse feature representations. We demonstrate our proposed method reduces sampling bias, achieves state-of-the-art accuracy and model calibration in an active learning setup with the query computation 11x faster than CoreSet and 26x faster than Bayesian active learning by disagreement. Our method yields well-calibrated models even with imbalanced datasets. We also evaluate robustness to dataset shift and out-of-distribution in active learning setup and demonstrate our proposed SCAL method outperforms high performing compute-intensive methods by a bigger margin (average 8.9% higher AUROC for out-of-distribution detection and average 7.2% lower ECE under dataset shift).
MOPED: Efficient priors for scalable variational inference in Bayesian deep neural networks
Krishnan, Ranganath, Subedar, Mahesh, Tickoo, Omesh
Variational inference for Bayesian deep neural networks (DNNs) requires specifying priors and approximate posterior distributions for neural network weights. Specifying meaningful weight priors is a challenging problem, particularly for scaling variational inference to deeper architectures involving high dimensional weight space. We propose Bayesian MOdel Priors Extracted from Deterministic DNN (MOPED) method for stochastic variational inference to choose meaningful prior distributions over weight space using deterministic weights derived from the pretrained DNNs of equivalent architecture. We evaluate the proposed approach on multiple datasets and real-world application domains with a range of varying complex model architectures to demonstrate MOPED enables scalable variational inference for Bayesian DNNs. The proposed method achieves faster training convergence and provides reliable uncertainty quantification, without compromising on the accuracy provided by the deterministic DNNs. We also propose hybrid architectures to Bayesian DNNs where deterministic and variational layers are combined to balance computation complexity during prediction phase and while providing benefits of Bayesian inference. We will release the source code for this work.
Uncertainty aware multimodal activity recognition with Bayesian inference
Subedar, Mahesh, Krishnan, Ranganath, Meyer, Paulo Lopez, Tickoo, Omesh, Huang, Jonathan
Deep neural networks (DNNs) provide state-of-the-art results for a multitude of applications, but the use of DNNs for multimodal audiovisual applications is still an unsolved problem. The current approaches that combine audiovisual information do not consider inherent uncertainty or leverage true classification confidence associated with each modality in the final decision. Our contribution in this work is to apply Bayesian variational inference to DNNs for audiovisual activity recognition and quantify model uncertainty along with principled confidence. We propose a novel approach that combines deterministic and variational layers to estimate model uncertainty and principled confidence. Our experiments with in- and out-of-distribution samples selected from a subset of the Moments-in-Time (MiT) dataset show more reliable confidence measure as compared to the non-Bayesian baseline. We also demonstrate the uncertainty estimates obtained from this framework can identify out-of-distribution data on the UCF101 and MiT datasets. In the multimodal setting, the proposed framework improved precision-recall AUC by 14.4% on the subset of MiT dataset as compared to non-Bayesian baseline.
BAR: Bayesian Activity Recognition using variational inference
Krishnan, Ranganath, Subedar, Mahesh, Tickoo, Omesh
Uncertainty estimation in deep neural networks is essential for designing reliable and robust AI systems. Applications such as video surveillance for identifying suspicious activities are designed with deep neural networks (DNNs), but DNNs do not provide uncertainty estimates. Capturing reliable uncertainty estimates in safety and security critical applications will help to establish trust in the AI system. Our contribution is to apply Bayesian deep learning framework to visual activity recognition application and quantify model uncertainty along with principled confidence. We utilize the variational inference technique while training the Bayesian DNNs to infer the approximate posterior distribution around model parameters and perform Monte Carlo sampling on the posterior of model parameters to obtain the predictive distribution. We show that the Bayesian inference applied to DNNs provides reliable confidence measures for visual activity recognition task as compared to the conventional DNNs. We also show that our method improves the visual activity recognition precision-recall score by 6% compared to non-Bayesian baseline. We evaluate our models on Moments-In-Time (MiT) activity recognition dataset by selecting a subset of in- and out-of-distribution video samples.