In applications of graphical models, we typically have more information than just the samples themselves. A prime example is the estimation of brain connectivity networks based on fMRI data, where in addition to the samples themselves, the spatial positions of the measurements are readily available. With particular regard for this application, we are thus interested in ways to incorporate additional knowledge most effectively into graph estimation. Our approach to this is to make neighborhood selection receptive to additional knowledge by strengthening the role of the tuning parameters. We demonstrate that this concept (i) can improve reproducibility, (ii) is computationally convenient and efficient, and (iii) carries a lucid Bayesian interpretation. We specifically show that the approach provides effective estimations of brain connectivity graphs from fMRI data. However, providing a general scheme for the inclusion of additional knowledge, our concept is expected to have applications in a wide range of domains.
Given genetic variations and various phenotypical traits, such as Magnetic Resonance Imaging (MRI) features, we consider two important and related tasks in biomedical research: i)to select genetic and phenotypical markers for disease diagnosis and ii) to identify associations between genetic and phenotypical data. These two tasks are tightly coupled because underlying associations between genetic variations and phenotypical features contain the biological basis for a disease. While a variety of sparse models have been applied for disease diagnosis and canonical correlation analysis and its extensions have bee widely used in association studies (e.g., eQTL analysis), these two tasks have been treated separately. To unify these two tasks, we present a new sparse Bayesian approach for joint association study and disease diagnosis. In this approach, common latent features are extracted from different data sources based on sparse projection matrices and used to predict multiple disease severity levels based on Gaussian process ordinal regression; in return, the disease status is used to guide the discovery of relationships between the data sources. The sparse projection matrices not only reveal interactions between data sources but also select groups of biomarkers related to the disease. To learn the model from data, we develop an efficient variational expectation maximization algorithm. Simulation results demonstrate that our approach achieves higher accuracy in both predicting ordinal labels and discovering associations between data sources than alternative methods. We apply our approach to an imaging genetics dataset for the study of Alzheimer's Disease (AD). Our method identifies biologically meaningful relationships between genetic variations, MRI features, and AD status, and achieves significantly higher accuracy for predicting ordinal AD stages than the competing methods.
We consider the problem of precision matrix estimation where, due to extraneous confounding of the underlying precision matrix, the data are independent but not identically distributed. While such confounding occurs in many scientific problems, our approach is inspired by recent neuroscientific research suggesting that brain function, as measured using functional magnetic resonance imagine (fMRI), is susceptible to confounding by physiological noise such as breathing and subject motion. Following the scientific motivation, we propose a graphical model, which in turn motivates a joint nonparametric estimator. We provide theoretical guarantees for the consistency and the convergence rate of the proposed estimator. In addition, we demonstrate that the optimization of the proposed estimator can be transformed into a series of linear programming problems, and thus be efficiently solved in parallel. Empirical results are presented using simulated and real brain imaging data, which suggest that our approach improves precision matrix estimation, as compared to baselines, when confounding is present.
We investigate the choice of tuning parameters for a Bayesian multi-level group lasso model developed for the joint analysis of neuroimaging and genetic data. The regression model we consider relates multivariate phenotypes consisting of brain summary measures (volumetric and cortical thickness values) to single nucleotide polymorphism (SNPs) data and imposes penalization at two nested levels, the first corresponding to genes and the second corresponding to SNPs. Associated with each level in the penalty is a tuning parameter which corresponds to a hyperparameter in the hierarchical Bayesian formulation. Following previous work on Bayesian lassos we consider the estimation of tuning parameters through either hierarchical Bayes based on hyperpriors and Gibbs sampling or through empirical Bayes based on maximizing the marginal likelihood using a Monte Carlo EM algorithm. For the specific model under consideration we find that these approaches can lead to severe overshrinkage of the regression parameter estimates in the high-dimensional setting or when the genetic effects are weak. We demonstrate these problems through simulation examples and study an approximation to the marginal likelihood which sheds light on the cause of this problem. We then suggest an alternative approach based on the widely applicable information criterion (WAIC), an asymptotic approximation to leave-one-out cross-validation that can be computed conveniently within an MCMC framework.
Spontaneous brain activity, as observed in functional neuroimaging, has been shown to display reproducible structure that expresses brain architecture and carries markers of brain pathologies. An important view of modern neuroscience is that such large-scale structure of coherent activity reflects modularity properties of brain connectivity graphs. However, to date, there has been no demonstration that the limited and noisy data available in spontaneous activity observations could be used to learn full-brain probabilistic models that generalize to new data. Learning such models entails two main challenges: i) modeling full brain connectivity is a difficult estimation problem that faces the curse of dimensionality and ii) variability between subjects, coupled with the variability of functional signals between experimental runs, makes the use of multiple datasets challenging. We describe subject-level brain functional connectivity structure as a multivariate Gaussian process and introduce a new strategy to estimate it from group data, by imposing a common structure on the graphical model in the population. We show that individual models learned from functional Magnetic Resonance Imaging (fMRI) data using this population prior generalize better to unseen data than models based on alternative regularization schemes. To our knowledge, this is the first report of a cross-validated model of spontaneous brain activity. Finally, we use the estimated graphical model to explore the large-scale characteristics of functional architecture and show for the first time that known cognitive networks appear as the integrated communities of functional connectivity graph.