Kumar, Sayantan
Multimodal Variational Autoencoder: a Barycentric View
Qiu, Peijie, Zhu, Wenhui, Kumar, Sayantan, Chen, Xiwen, Sun, Xiaotong, Yang, Jin, Razi, Abolfazl, Wang, Yalin, Sotiras, Aristeidis
Multiple signal modalities, such as vision and sounds, are naturally present in real-world phenomena. Recently, there has been growing interest in learning generative models, in particular variational autoencoder (VAE), to for multimodal representation learning especially in the case of missing modalities. The primary goal of these models is to learn a modality-invariant and modality-specific representation that characterizes information across multiple modalities. Previous attempts at multimodal VAEs approach this mainly through the lens of experts, aggregating unimodal inference distributions with a product of experts (PoE), a mixture of experts (MoE), or a combination of both. In this paper, we provide an alternative generic and theoretical formulation of multimodal VAE through the lens of barycenter. We first show that PoE and MoE are specific instances of barycenters, derived by minimizing the asymmetric weighted KL divergence to unimodal inference distributions. Our novel formulation extends these two barycenters to a more flexible choice by considering different types of divergences. In particular, we explore the Wasserstein barycenter defined by the 2-Wasserstein distance, which better preserves the geometry of unimodal distributions by capturing both modality-specific and modality-invariant representations compared to KL divergence. Empirical studies on three multimodal benchmarks demonstrated the effectiveness of the proposed method.
HiMAL: A Multimodal Hierarchical Multi-task Auxiliary Learning framework for predicting and explaining Alzheimer disease progression
Kumar, Sayantan, Yu, Sean, Michelson, Andrew, Kannampallil, Thomas, Payne, Philip
Objective: We aimed to develop and validate a novel multimodal framework HiMAL (Hierarchical, Multi-task Auxiliary Learning) framework, for predicting cognitive composite functions as auxiliary tasks that estimate the longitudinal risk of transition from Mild Cognitive Impairment (MCI) to Alzheimer Disease (AD). Methods: HiMAL utilized multimodal longitudinal visit data including imaging features, cognitive assessment scores, and clinical variables from MCI patients in the Alzheimer Disease Neuroimaging Initiative (ADNI) dataset, to predict at each visit if an MCI patient will progress to AD within the next 6 months. Performance of HiMAL was compared with state-of-the-art single-task and multi-task baselines using area under the receiver operator curve (AUROC) and precision recall curve (AUPRC) metrics. An ablation study was performed to assess the impact of each input modality on model performance. Additionally, longitudinal explanations regarding risk of disease progression were provided to interpret the predicted cognitive decline. Results: Out of 634 MCI patients (mean [IQR] age : 72.8 [67-78], 60% men), 209 (32%) progressed to AD. HiMAL showed better prediction performance compared to all single-modality singe-task baselines (AUROC = 0.923 [0.915-0.937]; AUPRC= 0.623 [0.605-0.644]; all p<0.05). Ablation analysis highlighted that imaging and cognition scores with maximum contribution towards prediction of disease progression. Discussion: Clinically informative model explanations anticipate cognitive decline 6 months in advance, aiding clinicians in future disease progression assessment. HiMAL relies on routinely collected EHR variables for proximal (6 months) prediction of AD onset, indicating its translational potential for point-of-care monitoring and managing of high-risk patients.
Improving Normative Modeling for Multi-modal Neuroimaging Data using mixture-of-product-of-experts variational autoencoders
Kumar, Sayantan, Payne, Philip, Sotiras, Aristeidis
Normative models in neuroimaging learn the brain patterns of healthy population distribution and estimate how disease subjects like Alzheimer's Disease (AD) deviate from the norm. Existing variational autoencoder (VAE)-based normative models using multimodal neuroimaging data aggregate information from multiple modalities by estimating product or averaging of unimodal latent posteriors. This can often lead to uninformative joint latent distributions which affects the estimation of subject-level deviations. In this work, we addressed the prior limitations by adopting the Mixture-of-Product-of-Experts (MoPoE) technique which allows better modelling of the joint latent posterior. Our model labelled subjects as outliers by calculating deviations from the multimodal latent space. Further, we identified which latent dimensions and brain regions were associated with abnormal deviations due to AD pathology.
Normative Modeling using Multimodal Variational Autoencoders to Identify Abnormal Brain Structural Patterns in Alzheimer Disease
Kumar, Sayantan, Payne, Philip, Sotiras, Aristeidis
Normative modelling is an emerging method for understanding the underlying heterogeneity within brain disorders like Alzheimer Disease (AD) by quantifying how each patient deviates from the expected normative pattern that has been learned from a healthy control distribution. Since AD is a multifactorial disease with more than one biological pathways, multimodal magnetic resonance imaging (MRI) neuroimaging data can provide complementary information about the disease heterogeneity. However, existing deep learning based normative models on multimodal MRI data use unimodal autoencoders with a single encoder and decoder that may fail to capture the relationship between brain measurements extracted from different MRI modalities. In this work, we propose multi-modal variational autoencoder (mmVAE) based normative modelling framework that can capture the joint distribution between different modalities to identify abnormal brain structural patterns in AD. Our multi-modal framework takes as input Freesurfer processed brain region volumes from T1-weighted (cortical and subcortical) and T2-weighed (hippocampal) scans of cognitively normal participants to learn the morphological characteristics of the healthy brain. The estimated normative model is then applied on Alzheimer Disease (AD) patients to quantify the deviation in brain volumes and identify the abnormal brain structural patterns due to the effect of the different AD stages. Our experimental results show that modeling joint distribution between the multiple MRI modalities generates deviation maps that are more sensitive to disease staging within AD, have a better correlation with patient cognition and result in higher number of brain regions with statistically significant deviations compared to a unimodal baseline model with all modalities concatenated as a single input.