Linear regression models have been successfully used to function estimation and model selection in high-dimensional data analysis. However, most existing methods are built on least squares with the mean square error (MSE) criterion, which are sensitive to outliers and their performance may be degraded for heavy-tailed noise. In this paper, we go beyond this criterion by investigating the regularized modal regression from a statistical learning viewpoint. A new regularized modal regression model is proposed for estimation and variable selection, which is robust to outliers, heavy-tailed noise, and skewed noise. On the theoretical side, we establish the approximation estimate for learning the conditional mode function, the sparsity analysis for variable selection, and the robustness characterization. On the application side, we applied our model to successfully improve the cognitive impairment prediction using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort data.
Alzheimer's disease (AD) is the most common neurodegenerative disease in older people. Despite considerable efforts to find a cure for AD, there is a 99.6% failure rate of clinical trials for AD drugs, likely because AD patients cannot easily be identified at early stages. This project investigated machine learning approaches to predict the clinical state of patients in future years to benefit AD research. Clinical data from 1737 patients was obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database and was processed using the "All-Pairs" technique, a novel methodology created for this project involving the comparison of all possible pairs of temporal data points for each patient. This data was then used to train various machine learning models. Models were evaluated using 7-fold cross-validation on the training dataset and confirmed using data from a separate testing dataset (110 patients). A neural network model was effective (mAUC = 0.866) at predicting the progression of AD on a month-by-month basis, both in patients who were initially cognitively normal and in patients suffering from mild cognitive impairment. Such a model could be used to identify patients at early stages of AD and who are therefore good candidates for clinical trials for AD therapeutics.
Recent results in coupled or temporal graphical models offer schemes for estimating the relationship structure between features when the data come from related (but distinct) longitudinal sources. A novel application of these ideas is for analyzing group-level differences, i.e., in identifying if trends of estimated objects (e.g., covariance or precision matrices) are different across disparate conditions (e.g., gender or disease). Often, poor effect sizes make detecting the differential signal over the full set of features difficult: for example, dependencies between only a subset of features may manifest differently across groups. In this work, we first give a parametric model for estimating trends in the space of SPD matrices as a function of one or more covariates. We then generalize scan statistics to graph structures, to search over distinct subsets of features (graph partitions) whose temporal dependency structure may show statistically significant group-wise differences. We theoretically analyze the Family Wise Error Rate (FWER) and bounds on Type 1 and Type 2 error. On a cohort of individuals with risk factors for Alzheimer's disease (but otherwise cognitively healthy), we find scientifically interesting group differences where the default analysis, i.e., models estimated on the full graph, do not survive reasonable significance thresholds.
In this paper, we introduce the use of a personalized Gaussian Process model (pGP) to predict the key metrics of Alzheimer's Disease progression (MMSE, ADAS-Cog13, CDRSB and CS) based on each patient's previous visits. We start by learning a population-level model using multi-modal data from previously seen patients using the base Gaussian Process (GP) regression. Then, this model is adapted sequentially over time to a new patient using domain adaptive GPs to form the patient's pGP. We show that this new approach, together with an auto-regressive formulation, leads to significant improvements in forecasting future clinical status and cognitive scores for target patients when compared to modeling the population with traditional GPs.
Joint models for longitudinal and time-to-event data are commonly used in longitudinal studies to forecast disease trajectories over time. Despite the many advantages of joint modeling, the standard forms suffer from limitations that arise from a fixed model specification and computational difficulties when applied to large datasets. We adopt a deep learning approach to address these limitations, enhancing existing methods with the flexibility and scalability of deep neural networks while retaining the benefits of joint modeling. Using data from the Alzheimer's Disease Neuroimaging Institute, we show improvements in performance and scalability compared to traditional methods.