Sparse high-dimensional linear mixed modeling with a partitioned empirical Bayes ECM algorithm
Zgodic, Anja, Bai, Ray, Zhang, Jiajia, McLain, Alexander C.
While high-dimensional data has been ubiquitous for some time, the use of longitudinal high-dimensional data or grouped (clustered) high-dimensional data has been recently increasing in research. For example, some genetic studies gather gene expression levels for an individual on multiple occasions in response to an exposure over time (Banchereau et al., 2016). Other ongoing studies - like the UK Biobank and the Adolescent Brain Cognitive Development Study - collect high-dimensional genetic/imaging information longitudinally to learn how individual changes in these markers are related to outcomes (Cole, 2020; Saragosa-Harris et al., 2022). Such data usually violates the traditional linear regression assumption that observations are independently and identically distributed. Data analysis should account for the dependence between observations belonging to the same individual. For the low dimensional setting where n p, extensive methodology is available for handling such data structures, e.g., linear mixed models (LMMs). The fields of LMMs and high-dimensional linear regression have extensive bodies of literature. However, they are largely separate, with a very narrow body of literature existing at the intersection of LMMs and high-dimensional longitudinal data (where p n). Unlike low-dimensional (p n) LMMs for which restricted maximum likelihood (REML) methods are readily available, fitting high-dimensional LMMs is considerably more challenging due to the non-convexity of the optimization function, which requires the inversion of large matrices in addition to iterative approaches. The few available methods for highdimensional LMMs rely on sparsity-inducing penalizations (e.g.
Oct-18-2023