High-throughput sequencing has become the method of choice for genome-wide transcriptomic analyses as its price has substantially decreased over the last years. Nevertheless, the high cost of standard RNA library preparation and the complexity of the underlying data analysis still prevent this approach from becoming as routine as quantitative (q) PCR, especially when many samples need to be analyzed. To alleviate this high cost, the emerging single-cell transcriptomics field implemented the sample barcoding/early multiplexing principle. This reduces both the RNA-seq cost and preparation time by allowing the generation of a single sequencing library that contains multiple distinct samples/cells . Such a strategy could also be of value to reduce the cost and processing time of bulk RNA sequencing of large sets of samples [2,3,4,5].
The brain is responsible for cognition, behavior, and much of what makes us uniquely human. The development of the brain is a highly complex process, and this process is reliant on precise regulation of molecular and cellular events grounded in the spatiotemporal regulation of the transcriptome. Disruption of this regulation can lead to neuropsychiatric disorders. The regulatory, epigenomic, and transcriptomic features of the human brain have not been comprehensively compiled across time, regions, or cell types. Understanding the etiology of neuropsychiatric disorders requires knowledge not just of endpoint differences between healthy and diseased brains but also of the developmental and cellular contexts in which these differences arise. Moreover, an emerging body of research indicates that many aspects of the development and physiology of the human brain are not well recapitulated in model organisms, and therefore it is necessary that neuropsychiatric disorders be understood in the broader context of the developing and adult human brain. Here we describe the generation and analysis of a variety of genomic data modalities at the tissue and single-cell levels, including transcriptome, DNA methylation, and histone modifications across multiple brain regions ranging in age from embryonic development through adulthood. We observed a widespread transcriptomic transition beginning during late fetal development and consisting of sharply decreased regional differences. This reduction coincided with increases in the transcriptional signatures of mature neurons and the expression of genes associated with dendrite development, synapse development, and neuronal activity, all of which were temporally synchronous across neocortical areas, as well as myelination and oligodendrocytes, which were asynchronous. Moreover, genes including MEF2C, SATB2, and TCF4, with genetic associations to multiple brain-related traits and disorders, converged in a small number of modules exhibiting spatial or spatiotemporal specificity. We generated and applied our dataset to document transcriptomic and epigenetic changes across human development and then related those changes to major neuropsychiatric disorders. These data allowed us to identify genes, cell types, gene coexpression modules, and spatiotemporal loci where disease risk might converge, demonstrating the utility of the dataset and providing new insights into human development and disease.
In recent years, the advances in single-cell RNA-seq techniques have enabled us to perform large-scale transcriptomic profiling at single-cell resolution in a high-throughput manner. Unsupervised learning such as data clustering has become the central component to identify and characterize novel cell types and gene expression patterns. In this study, we review the existing single-cell RNA-seq data clustering methods with critical insights into the related advantages and limitations. In addition, we also review the upstream single-cell RNA-seq data processing techniques such as quality control, normalization, and dimension reduction. We conduct performance comparison experiments to evaluate several popular single-cell RNA-seq clustering approaches on two single-cell transcriptomic datasets.
We propose a probabilistic model for interpreting gene expression levels that are observed through single-cell RNA sequencing. In the model, each cell has a low-dimensional latent representation. Additional latent variables account for technical effects that may erroneously set some observations of gene expression levels to zero. Conditional distributions are specified by neural networks, giving the proposed model enough flexibility to fit the data well. We use variational inference and stochastic optimization to approximate the posterior distribution. The inference procedure scales to over one million cells, whereas competing algorithms do not. Even for smaller datasets, for several tasks, the proposed procedure outperforms state-of-the-art methods like ZIFA and ZINB-WaVE. We also extend our framework to account for batch effects and other confounding factors, and propose a Bayesian hypothesis test for differential expression that outperforms DESeq2.
The Laboratory of Computational Biology (Stein Aerts lab) is part of the VIB Center for Brain & Disease Research and the Department of Human Genetics (University of Leuven, Belgium). Our lab is a "humid" lab, half wet and half dry. In the wet-lab we apply high-throughput technologies to decipher enhancer logic and map gene regulatory networks, such as RNA-seq for transcriptomics and ATAC-seq and ChIP-seq for epigenomic profiling. To test the activities of promoters and enhancers we use massively parallel enhancer-reporter assays. Finally, to map high-resolution landscapes of possible cellular states we use single-cell transcriptomics and single-cell epigenomics.