Identifying latent disease factors differently expressed in patient subgroups using group factor analysis

Ferreira, Fabio S., Ashburner, John, Bouzigues, Arabella, Suksasilp, Chatrin, Russell, Lucy L., Foster, Phoebe H., Ferry-Bolder, Eve, van Swieten, John C., Jiskoot, Lize C., Seelaar, Harro, Sanchez-Valle, Raquel, Laforce, Robert, Graff, Caroline, Galimberti, Daniela, Vandenberghe, Rik, de Mendonca, Alexandre, Tiraboschi, Pietro, Santana, Isabel, Gerhard, Alexander, Levin, Johannes, Sorbi, Sandro, Otto, Markus, Pasquier, Florence, Ducharme, Simon, Butler, Chris R., Ber, Isabelle Le, Finger, Elizabeth, Tartaglia, Maria C., Masellis, Mario, Rowe, James B., Synofzik, Matthis, Moreno, Fermin, Borroni, Barbara, Kaski, Samuel, Rohrer, Jonathan D., Mourao-Miranda, Janaina

arXiv.org Machine Learning 

The heterogeneity of neurological and mental health disorders has been a key confound to disease understanding, treatment development and outcome prediction, as patient populations are thought to include multiple disease pathways that selectively respond to treatment (Kapur et al., 2012). These challenges are reflected in poor treatment outcomes; for instance, in depression, approximately only 40% of patients remit after first-line antidepressant treatment or psychotherapy (Amick et al., 2015; Cuijpers et al., 2014; Fava and Davidson, 1996; Trivedi et al., 2006). Diagnostic categories in psychiatry have historically been defined based on signs and symptoms, prioritising diagnostic agreement between clinicians, rather than underlying biological mechanisms (Freedman et al., 2013; Robins and Guze, 1970). Resultingly, the usefulness of supervised machine learning methods as diagnostic tools for mental health disorders (i.e., classifying patients vs. healthy controls) is questionable, as they may simply inherit the flaws of current diagnostic categories. Additional challenges in neurological and mental health disorders are comorbidity (i.e., individuals with one disorder often develop another disorder during their lifespan) and that different disorders can share similar symptoms (Kessler et al., 2005). To address the limitations of current diagnostic categories in psychiatry, the National Institute of Mental Health launched the Research Domain Criteria framework (RDoC) in 2009 (https://www.nimh.nih.gov/research/ 2 research-funded-by-nimh/rdoc) as an attempt to move beyond diagnostic categories and ground psychiatry within neurobiological constructs that combine multiple levels of measures or sources of information (Insel et al., 2010). Multivariate methods, such as Canonical Correlation Analysis (CCA) and related methods, that do not rely on the diagnostic categories, have been widely used to uncover latent disease dimensions capturing associations between brain imaging and non-imaging data (e.g., self-report questionnaires, cognitive tests and genetics). The identified latent dimensions provide information on how a set of non-imaging features (e.g.