gene module
Interpretable Neural ODEs for Gene Regulatory Network Discovery under Perturbations
Lin, Zaikang, Chang, Sei, Zweig, Aaron, Azizi, Elham, Knowles, David A.
Modern high-throughput biological datasets with thousands of perturbations provide the opportunity for large-scale discovery of causal graphs that represent the regulatory interactions between genes. Numerous methods have been proposed to infer a directed acyclic graph (DAG) corresponding to the underlying gene regulatory network (GRN) that captures causal gene relationships. However, existing models have restrictive assumptions (e.g. linearity, acyclicity), limited scalability, and/or fail to address the dynamic nature of biological processes such as cellular differentiation. We propose PerturbODE, a novel framework that incorporates biologically informative neural ordinary differential equations (neural ODEs) to model cell state trajectories under perturbations and derive the causal GRN from the neural ODE's parameters. We demonstrate PerturbODE's efficacy in trajectory prediction and GRN inference across simulated and real over-expression datasets.
Towards Biologically Plausible and Private Gene Expression Data Generation
Chen, Dingfan, Oestreich, Marie, Afonja, Tejumade, Kerkouche, Raouf, Becker, Matthias, Fritz, Mario
Generative models trained with Differential Privacy (DP) are becoming increasingly prominent in the creation of synthetic data for downstream applications. Existing literature, however, primarily focuses on basic benchmarking datasets and tends to report promising results only for elementary metrics and relatively simple data distributions. In this paper, we initiate a systematic analysis of how DP generative models perform in their natural application scenarios, specifically focusing on real-world gene expression data. We conduct a comprehensive analysis of five representative DP generation methods, examining them from various angles, such as downstream utility, statistical properties, and biological plausibility. Our extensive evaluation illuminates the unique characteristics of each DP generation method, offering critical insights into the strengths and weaknesses of each approach, and uncovering intriguing possibilities for future developments. Perhaps surprisingly, our analysis reveals that most methods are capable of achieving seemingly reasonable downstream utility, according to the standard evaluation metrics considered in existing literature. Nevertheless, we find that none of the DP methods are able to accurately capture the biological characteristics of the real dataset. This observation suggests a potential over-optimistic assessment of current methodologies in this field and underscores a pressing need for future enhancements in model design.
In vivo Perturb-Seq reveals neuronal and glial abnormalities associated with autism risk genes
CRISPR targeting in vivo, especially in mammals, can be difficult and time consuming when attempting to determine the effects of a single gene. However, such studies may be required to identify pathological gene variants with effects in specific cells along a developmental trajectory. To study the function of genes implicated in autism spectrum disorders (ASDs), Jin et al. applied a gene-editing and single-cell–sequencing system, Perturb-Seq, to knock out 35 ASD candidate genes in multiple mice embryos (see the Perspective by Treutlein and Camp). This method identified networks of gene expression in neuronal and glial cells that suggest new functions in ASD-related genes. Science , this issue p. [eaaz6063][1]; see also p. [1038][2] ### INTRODUCTION Human genetic studies have revealed long lists of genes and loci associated with risk for many diseases and disorders, but to systematically evaluate their phenotypic effects remains challenging. Without any a priori knowledge, these risk genes could affect any cellular processes in any cell type or tissue, which creates an enormous search space for identifying possible downstream effects. New high-throughput approaches are needed to functionally dissect these large gene sets across a spectrum of cell types in vivo. ### RATIONALE Analysis of trio-based whole-exome sequencing has implicated a large number of de novo loss-of-function variants that contribute to autism spectrum disorder and developmental delay (ASD/ND) risk. Such de novo variants often have large effect sizes, thus providing a key entry point for mechanistic studies. We have developed in vivo Perturb-Seq to allow simultaneous assessment of the individual phenotypes of a panel of such risk genes in the context of the developing mouse brain. ### RESULTS Using CRISPR-Cas9, we introduced frameshift mutations in 35 ASD/ND risk genes in pools, within the developing mouse neocortex in utero, followed by single-cell transcriptomic analysis of perturbed cells from the early postnatal brain. We analyzed five broad cell classes—cortical projection neurons, cortical inhibitory neurons, astrocytes, oligodendrocytes, and microglia—and selected cells that had received only single perturbations. Using weighted gene correlation network analysis, we identified 14 covarying gene modules that represent transcriptional programs expressed in different classes of cortical cells. These modules included both those affecting common biological processes across multiple cell subsets and others representing cell type–specific features restricted to certain subsets. We estimated the effect size of each perturbation on each of the 14 gene modules by fitting a joint linear regression model, estimating how module gene expression in cells from each perturbation group deviated from their expression level in internal control cells. Perturbations in nine ASD/ND genes had significant effects across five modules across four cell classes, including cortical projection neurons, cortical inhibitory neurons, astrocytes, and oligodendrocytes. Some of these results were validated by using a single-perturbation model as well as a germline-modified mutant mouse model. To establish whether the perturbation-associated gene modules identified in the mouse cerebral cortex are relevant to human biology and ASD/ND pathology, we performed co-analyses of data from ASD and control human brains and human cerebral organoids. Both gene expression and gene covariation (“modularity”) of several of the gene modules identified in the mouse Perturb-Seq analysis are conserved in human brain tissue. Comparison with single-cell data from ASD patients showed overlap in both affected cell types and transcriptomic phenotypes. ### CONCLUSION In vivo Perturb-Seq can serve as a scalable tool for systems genetic studies of large gene panels to reveal their cell-intrinsic functions at single-cell resolution in complex tissues. In this work, we demonstrated the application of in vivo Perturb-Seq to ASD/ND risk genes in the developing brain. This method can be applied across diverse diseases and tissues in the intact organism. ![Figure][3] In vivo Perturb-Seq identified neuron and glia-associated effects by perturbations of risk genes implicated in ASD/ND. De novo risk genes in this study were chosen from Satterstrom et al. (2018), and co-analysis with ASD patient data at bottom right is from Velmeshev et al. (2019); full citations for both are included in the full article online. The number of disease risk genes and loci identified through human genetic studies far outstrips the capacity to systematically study their functions. We applied a scalable genetic screening approach, in vivo Perturb-Seq, to functionally evaluate 35 autism spectrum disorder/neurodevelopmental delay (ASD/ND) de novo loss-of-function risk genes. Using CRISPR-Cas9, we introduced frameshift mutations in these risk genes in pools, within the developing mouse brain in utero, followed by single-cell RNA-sequencing of perturbed cells in the postnatal brain. We identified cell type–specific and evolutionarily conserved gene modules from both neuronal and glial cell classes. Recurrent gene modules and cell types are affected across this cohort of perturbations, representing key cellular effects across sets of ASD/ND risk genes. In vivo Perturb-Seq allows us to investigate how diverse mutations affect cell types and states in the developing organism. [1]: /lookup/doi/10.1126/science.aaz6063 [2]: /lookup/doi/10.1126/science.abf3661 [3]: pending:yes
Transcriptome and epigenome landscape of human cortical development modeled in organoids
The human cerebral cortex has undergone an extraordinary increase in size and complexity during mammalian evolution. Cortical cell lineages are specified in the embryo, and genetic and epidemiological evidence implicates early cortical development in the etiology of neuropsychiatric disorders such as autism spectrum disorder (ASD), intellectual disabilities, and schizophrenia. Most of the disease-implicated genomic variants are located outside of genes, and the interpretation of noncoding mutations is lagging behind owing to limited annotation of functional elements in the noncoding genome. We set out to discover gene-regulatory elements and chart their dynamic activity during prenatal human cortical development, focusing on enhancers, which carry most of the weight upon regulation of gene expression. We longitudinally modeled human brain development using human induced pluripotent stem cell (hiPSC)–derived cortical organoids and compared organoids to isogenic fetal brain tissue. Fetal fibroblast–derived hiPSC lines were used to generate cortically patterned organoids and to compare oganoids' epigenome and transcriptome to that of isogenic fetal brains and external datasets. Organoids model cortical development between 5 and 16 postconception weeks, thus enabling us to study transitions from cortical stem cells to progenitors to early neurons. The greatest changes occur at the transition from stem cells to progenitors. The regulatory landscape encompasses a total set of 96,375 enhancers linked to target genes, with 49,640 enhancers being active in organoids but not in mid-fetal brain, suggesting major roles in cortical neuron specification. Enhancers that gained activity in the human lineage are active in the earliest stages of organoid development, when they target genes that regulate the growth of radial glial cells. Parallel weighted gene coexpression network analysis (WGCNA) of transcriptome and enhancer activities defined a number of modules of coexpressed genes and coactive enhancers, following just six and four global temporal patterns that we refer to as supermodules, likely reflecting fundamental programs in embryonic and fetal brain. Correlations between gene expression and enhancer activity allowed stratifying enhancers into two categories: activating regulators (A-regs) and repressive regulators (R-regs).