Mahapatra, Dwarikanath
Gene42: Long-Range Genomic Foundation Model With Dense Attention
Vishniakov, Kirill, Amor, Boulbaba Ben, Tekin, Engin, ElNaker, Nancy A., Viswanathan, Karthik, Medvedev, Aleksandr, Singh, Aahan, Nadeem, Maryam, Sayeed, Mohammad Amaan, Kanithi, Praveenkumar, Magalhaes, Tiago, Vassilieva, Natalia, Mahapatra, Dwarikanath, Pimentel, Marco, Khan, and Shadab
We introduce Gene42, a novel family of Genomic Foundation Models (GFMs) designed to manage context lengths of up to 192,000 base pairs (bp) at a single-nucleotide resolution. Gene42 models utilize a decoder-only (LLaMA-style) architecture with a dense self-attention mechanism. Initially trained on fixed-length sequences of 4,096 bp, our models underwent continuous pretraining to extend the context length to 192,000 bp. This iterative extension allowed for the comprehensive processing of large-scale genomic data and the capture of intricate patterns and dependencies within the human genome. Gene42 is the first dense attention model capable of handling such extensive long context lengths in genomics, challenging state-space models that often rely on convolutional operators among other mechanisms. Our pretrained models exhibit notably low perplexity values and high reconstruction accuracy, highlighting their strong ability to model genomic data. Extensive experiments on various genomic benchmarks have demonstrated state-of-the-art performance across multiple tasks, including biotype classification, regulatory region identification, chromatin profiling prediction, variant pathogenicity prediction, and species classification. The models are publicly available at huggingface.co/inceptionai.
Characterizing Continual Learning Scenarios and Strategies for Audio Analysis
Bhatt, Ruchi, Kumari, Pratibha, Mahapatra, Dwarikanath, Saddik, Abdulmotaleb El, Saini, Mukesh
Audio analysis is useful in many application scenarios. The state-of-the-art audio analysis approaches assume that the data distribution at training and deployment time will be the same. However, due to various real-life environmental factors, the data may encounter drift in its distribution or can encounter new classes in the late future. Thus, a one-time trained model might not perform adequately. In this paper, we characterize continual learning (CL) approaches in audio analysis. In this paper, we characterize continual learning (CL) approaches, intended to tackle catastrophic forgetting arising due to drifts. As there is no CL dataset for audio analysis, we use DCASE 2020 to 2023 datasets to create various CL scenarios for audio-based monitoring tasks. We have investigated the following CL and non-CL approaches: EWC, LwF, SI, GEM, A-GEM, GDumb, Replay, Naive, cumulative, and joint training. The study is very beneficial for researchers and practitioners working in the area of audio analysis for developing adaptive models. We observed that Replay achieved better results than other methods in the DCASE challenge data. It achieved an accuracy of 70.12% for the domain incremental scenario and an accuracy of 96.98% for the class incremental scenario.
Leukocyte Classification using Multimodal Architecture Enhanced by Knowledge Distillation
Yang, Litao, Mehta, Deval, Mahapatra, Dwarikanath, Ge, Zongyuan
Recently, a lot of automated white blood cells (WBC) or leukocyte classification techniques have been developed. However, all of these methods only utilize a single modality microscopic image i.e. either blood smear or fluorescence based, thus missing the potential of a better learning from multimodal images. In this work, we develop an efficient multimodal architecture based on a first of its kind multimodal WBC dataset for the task of WBC classification. Specifically, our proposed idea is developed in two steps - 1) First, we learn modality specific independent subnetworks inside a single network only; 2) We further enhance the learning capability of the independent subnetworks by distilling knowledge from high complexity independent teacher networks. With this, our proposed framework can achieve a high performance while maintaining low complexity for a multimodal dataset. Our unique contribution is two-fold - 1) We present a first of its kind multimodal WBC dataset for WBC classification; 2) We develop a high performing multimodal architecture which is also efficient and low in complexity at the same time.
sZoom: A Framework for Automatic Zoom into High Resolution Surveillance Videos
Saini, Mukesh, Guthier, Benjamin, Kuang, Hao, Mahapatra, Dwarikanath, Saddik, Abdulmotaleb El
Current cameras are capable of recording high resolution video. While viewing on a mobile device, a user can manually zoom into this high resolution video to get more detailed view of objects and activities. However, manual zooming is not suitable for surveillance and monitoring. It is tiring to continuously keep zooming into various regions of the video. Also, while viewing one region, the operator may miss activities in other regions. In this paper, we propose sZoom, a framework to automatically zoom into a high resolution surveillance video. The proposed framework selectively zooms into the sensitive regions of the video to present details of the scene, while still preserving the overall context required for situation assessment. A multi-variate Gaussian penalty is introduced to ensure full coverage of the scene. The method achieves near real-time performance through a number of timing optimizations. An extensive user study shows that, while watching a full HD video on a mobile device, the system enhances the security operator's efficiency in understanding the details of the scene by 99% on the average compared to a scaled version of the original high resolution video. The produced video achieved 46% higher ratings for usefulness in a surveillance task.
Deep multiscale convolutional feature learning for weakly supervised localization of chest pathologies in X-ray images
Sedai, Suman, Mahapatra, Dwarikanath, Ge, Zongyuan, Chakravorty, Rajib, Garnavi, Rahil
Localization of chest pathologies in chest X-ray images is a challenging task because of their varying sizes and appearances. We propose a novel weakly supervised method to localize chest pathologies using class aware deep multiscale feature learning. Our method leverages intermediate feature maps from CNN layers at different stages of a deep network during the training of a classification model using image level annotations of pathologies. During the training phase, a set of \emph{layer relevance weights} are learned for each pathology class and the CNN is optimized to perform pathology classification by convex combination of feature maps from both shallow and deep layers using the learned weights. During the test phase, to localize the predicted pathology, the multiscale attention map is obtained by convex combination of class activation maps from each stage using the \emph{layer relevance weights} learned during the training phase. We have validated our method using 112000 X-ray images and compared with the state-of-the-art localization methods. We experimentally demonstrate that the proposed weakly supervised method can improve the localization performance of small pathologies such as nodule and mass while giving comparable performance for bigger pathologies e.g., Cardiomegaly