Mahmood, Usman
Self-Supervised Mental Disorder Classifiers via Time Reversal
Iqbal, Zafar, Mahmood, Usman, Fu, Zening, Plis, Sergey
Data scarcity is a notable problem, especially in the medical domain, due to patient data laws. Therefore, efficient Pre-Training techniques could help in combating this problem. In this paper, we demonstrate that a model trained on the time direction of functional neuro-imaging data could help in any downstream task, for example, classifying diseases from healthy controls in fMRI data. We train a Deep Neural Network on Independent components derived from fMRI data using the Independent component analysis (ICA) technique. It learns time direction in the ICA-based data. This pre-trained model is further trained to classify brain disorders in different datasets. Through various experiments, we have shown that learning time direction helps a model learn some causal relation in fMRI data that helps in faster convergence, and consequently, the model generalizes well in downstream classification tasks even with fewer data records.
Detecting Spurious Correlations with Sanity Tests for Artificial Intelligence Guided Radiology Systems
Mahmood, Usman, Shrestha, Robik, Bates, David D. B., Mannelli, Lorenzo, Corrias, Giuseppe, Erdi, Yusuf, Kanan, Christopher
Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.
Whole MILC: generalizing learned dynamics across tasks, datasets, and populations
Mahmood, Usman, Rahman, Md Mahfuzur, Fedorov, Alex, Lewis, Noah, Fu, Zening, Calhoun, Vince D., Plis, Sergey M.
Behavioral changes are the earliest signs of a mental disorder, but arguably, the dynamics of brain function gets affected even earlier. Subsequently, spatio-temporal structure of disorder-specific dynamics is crucial for early diagnosis and understanding the disorder mechanism. A common way of learning discriminatory features relies on training a classifier and evaluating feature importance. Classical classifiers, based on handcrafted features are quite powerful, but suffer the curse of dimensionality when applied to large input dimensions of spatio-temporal data. Deep learning algorithms could handle the problem and a model introspection could highlight discriminatory spatio-temporal regions but need way more samples to train. In this paper we present a novel self supervised training schema which reinforces whole sequence mutual information local to context (whole MILC). We pre-train the whole MILC model on unlabeled and unrelated healthy control data. We test our model on three different disorders (i) Schizophrenia (ii) Autism and (iii) Alzheimers and four different studies. Our algorithm outperforms existing self-supervised pre-training methods and provides competitive classification results to classical machine learning algorithms. Importantly, whole MILC enables attribution of subject diagnosis to specific spatio-temporal regions in the fMRI signal.
Transfer Learning of fMRI Dynamics
Mahmood, Usman, Rahman, Md Mahfuzur, Fedorov, Alex, Fu, Zening, Plis, Sergey
As a mental disorder progresses, it may affect brain structu re, but brain function expressed in brain dynamics is affected much earlier. Captu ring the moment when brain dynamics express the disorder is crucial for early dia gnosis. The traditional approach to this problem via training classifiers either pro ceeds from handcrafted features or requires large datasets to combat the m n problem when a high dimensional fMRI volume only has a single label that carries le arning signal. Large datasets may not be available for a study of each disorder, or rare disorder types or subpopulations may not warrant for them. In this paper, w e demonstrate a self-supervised pre-training method that enables us to pre -train directly on fMRI dynamics of healthy control subjects and transfer the learn ing to much smaller datasets of schizophrenia. Not only we enable classificatio n of disorder directly based on fMRI dynamics in small data but also significantly sp eed up the learning when possible. This is encouraging evidence of informat ive transfer learning across datasets and diagnostic categories.
Fusion Subspace Clustering: Full and Incomplete Data
Pimentel-Alarcón, Daniel L., Mahmood, Usman
Modern inference and learning often hinge on identifying low-dimensional structures that approximate large scale data. Subspace clustering achieves this through a union of linear subspaces. However, in contemporary applications data is increasingly often incomplete, rendering standard (full-data) methods inapplicable. On the other hand, existing incomplete-data methods present major drawbacks, like lifting an already high-dimensional problem, or requiring a super polynomial number of samples. Motivated by this, we introduce a new subspace clustering algorithm inspired by fusion penalties. The main idea is to permanently assign each datum to a subspace of its own, and minimize the distance between the subspaces of all data, so that subspaces of the same cluster get fused together. Our approach is entirely new to both, full and missing data, and unlike other methods, it directly allows noise, it requires no liftings, it allows low, high, and even full-rank data, it approaches optimal (information-theoretic) sampling rates, and it does not rely on other methods such as low-rank matrix completion to handle missing data. Furthermore, our extensive experiments on both real and synthetic data show that our approach performs comparably to the state-of-the-art with complete data, and dramatically better if data is missing.