Mourao-Miranda, Janaina
Identifying latent disease factors differently expressed in patient subgroups using group factor analysis
Ferreira, Fabio S., Ashburner, John, Bouzigues, Arabella, Suksasilp, Chatrin, Russell, Lucy L., Foster, Phoebe H., Ferry-Bolder, Eve, van Swieten, John C., Jiskoot, Lize C., Seelaar, Harro, Sanchez-Valle, Raquel, Laforce, Robert, Graff, Caroline, Galimberti, Daniela, Vandenberghe, Rik, de Mendonca, Alexandre, Tiraboschi, Pietro, Santana, Isabel, Gerhard, Alexander, Levin, Johannes, Sorbi, Sandro, Otto, Markus, Pasquier, Florence, Ducharme, Simon, Butler, Chris R., Ber, Isabelle Le, Finger, Elizabeth, Tartaglia, Maria C., Masellis, Mario, Rowe, James B., Synofzik, Matthis, Moreno, Fermin, Borroni, Barbara, Kaski, Samuel, Rohrer, Jonathan D., Mourao-Miranda, Janaina
The heterogeneity of neurological and mental health disorders has been a key confound to disease understanding, treatment development and outcome prediction, as patient populations are thought to include multiple disease pathways that selectively respond to treatment (Kapur et al., 2012). These challenges are reflected in poor treatment outcomes; for instance, in depression, approximately only 40% of patients remit after first-line antidepressant treatment or psychotherapy (Amick et al., 2015; Cuijpers et al., 2014; Fava and Davidson, 1996; Trivedi et al., 2006). Diagnostic categories in psychiatry have historically been defined based on signs and symptoms, prioritising diagnostic agreement between clinicians, rather than underlying biological mechanisms (Freedman et al., 2013; Robins and Guze, 1970). Resultingly, the usefulness of supervised machine learning methods as diagnostic tools for mental health disorders (i.e., classifying patients vs. healthy controls) is questionable, as they may simply inherit the flaws of current diagnostic categories. Additional challenges in neurological and mental health disorders are comorbidity (i.e., individuals with one disorder often develop another disorder during their lifespan) and that different disorders can share similar symptoms (Kessler et al., 2005). To address the limitations of current diagnostic categories in psychiatry, the National Institute of Mental Health launched the Research Domain Criteria framework (RDoC) in 2009 (https://www.nimh.nih.gov/research/ 2 research-funded-by-nimh/rdoc) as an attempt to move beyond diagnostic categories and ground psychiatry within neurobiological constructs that combine multiple levels of measures or sources of information (Insel et al., 2010). Multivariate methods, such as Canonical Correlation Analysis (CCA) and related methods, that do not rely on the diagnostic categories, have been widely used to uncover latent disease dimensions capturing associations between brain imaging and non-imaging data (e.g., self-report questionnaires, cognitive tests and genetics). The identified latent dimensions provide information on how a set of non-imaging features (e.g.
A hierarchical Bayesian model to find brain-behaviour associations in incomplete data sets
Ferreira, Fabio S., Mihalik, Agoston, Adams, Rick A., Ashburner, John, Mourao-Miranda, Janaina
Canonical Correlation Analysis (CCA) and its regularised versions have been widely used in the neuroimaging community to uncover multivariate associations between two data modalities (e.g., brain imaging and behaviour). However, these methods have inherent limitations: (1) statistical inferences about the associations are often not robust; (2) the associations within each data modality are not modelled; (3) missing values need to be imputed or removed. Group Factor Analysis (GFA) is a hierarchical model that addresses the first two limitations by providing Bayesian inference and modelling modality-specific associations. Here, we propose an extension of GFA that handles missing data, and highlight that GFA can be used as a predictive model. We applied GFA to synthetic and real data consisting of brain connectivity and non-imaging measures from the Human Connectome Project (HCP). In synthetic data, GFA uncovered the underlying shared and specific factors and predicted correctly the non-observed data modalities in complete and incomplete data sets. In the HCP data, we identified four relevant shared factors, capturing associations between mood, alcohol and drug use, cognition, demographics and psychopathological measures and the default mode, frontoparietal control, dorsal and ventral networks and insula, as well as two factors describing associations within brain connectivity. In addition, GFA predicted a set of non-imaging measures from brain connectivity. These findings were consistent in complete and incomplete data sets, and replicated previous findings in the literature. GFA is a promising tool that can be used to uncover associations between and within multiple data modalities in benchmark datasets (such as, HCP), and easily extended to more complex models to solve more challenging tasks.
Finding the needle in high-dimensional haystack: A tutorial on canonical correlation analysis
Wang, Hao-Ting, Smallwood, Jonathan, Mourao-Miranda, Janaina, Xia, Cedric Huchuan, Satterthwaite, Theodore D., Bassett, Danielle S., Bzdok, Danilo
Since the beginning of the 21st century, the size, breadth, and granularity of data in biology and medicine has grown rapidly. In the example of neuroscience, studies with thousands of subjects are becoming more common, which provide extensive phenotyping on the behavioral, neural, and genomic level with hundreds of variables. The complexity of such big data repositories offer new opportunities and pose new challenges to investigate brain, cognition, and disease. Canonical correlation analysis (CCA) is a prototypical family of methods for wrestling with and harvesting insight from such rich datasets. This doubly-multivariate tool can simultaneously consider two variable sets from different modalities to uncover essential hidden associations. Our primer discusses the rationale, promises, and pitfalls of CCA in biomedicine.
Interpreting weight maps in terms of cognitive or clinical neuroscience: nonsense?
Schrouff, Jessica, Mourao-Miranda, Janaina
Since machine learning models have been applied to neuroimaging data, researchers have drawn conclusions from the derived weight maps. In particular, weight maps of classifiers between two conditions are often described as a proxy for the underlying signal differences between the conditions. Recent studies have however suggested that such weight maps could not reliably recover the source of the neural signals and even led to false positives (FP). In this work, we used semi-simulated data from ElectroCorticoGraphy (ECoG) to investigate how the signal-to-noise ratio and sparsity of the neural signal affect the similarity between signal and weights. We show that not all cases produce FP and that it is unlikely for FP features to have a high weight in most cases.