Goto

Collaborating Authors

Results


Transcriptomic signatures across human tissues identify functional rare genetic variation

Science

Every human genome contains tens of thousands of rare genetic variants—which include single nucleotide changes, insertions or deletions, and larger structural variants—and some may have a functional effect. Ferraro et al. examined data from individuals in the Genotype-Tissue Expression (GTEx) project for outliers across tissues caused by gene expression, splicing, and allele-specific expression. Single rare variants were observed that affected the expression and allele-specific expression of multiple genes and, in the case of a gene fusion event, splicing. Experimental and computational validation suggest that many individuals carry more than 50 rare variants that affect transcription in some way. Although most variants were predicted to not affect an individual's phenotype, a small percentage showed likely disease-related associations, emphasizing the importance of studying the impact of rare genetic variation on the transcriptome. Science , this issue p. [eaaz5900][1] ### INTRODUCTION The human genome contains tens of thousands of rare (minor allele frequency <1%) variants, some of which contribute to disease risk. Using 838 samples with whole-genome and multitissue transcriptome sequencing data in the Genotype-Tissue Expression (GTEx) project version 8, we assessed how rare genetic variants contribute to extreme patterns in gene expression (eOutliers), allelic expression (aseOutliers), and alternative splicing (sOutliers). We integrated these three signals across 49 tissues with genomic annotations to prioritize high-impact rare variants (RVs) that associate with human traits. ### RATIONALE Outlier gene expression aids in identifying functional RVs. Transcriptome sequencing provides diverse measurements beyond gene expression, including allele-specific expression and alternative splicing, which can provide additional insight into RV functional effects. ### RESULTS After identifying multitissue eOutliers, aseOutliers, and sOutliers, we found that outlier individuals of each type were significantly more likely to carry an RV near the corresponding gene. Among eOutliers, we observed strong enrichment of rare structural variants. sOutliers were particularly enriched for RVs that disrupted or created a splicing consensus sequence. aseOutliers provided the strongest enrichment signal when evaluated from just a single tissue. We developed Watershed, a probabilistic model for personal genome interpretation that improves over standard genomic annotation–based methods for scoring RVs by integrating these three transcriptomic signals from the same individual and replicates in an independent cohort. To assess whether outlier RVs identified in GTEx associate with traits, we evaluated these variants for association with diverse traits in the UK Biobank, the Million Veterans Program, and the Jackson Heart Study. We found that transcriptome-assisted prioritization identified RVs with larger trait effect sizes and were better predictors of effect size than genomic annotation alone. ### CONCLUSION With >800 genomes matched with transcriptomes across 49 tissues, we were able to study RVs that underlie extreme changes in the transcriptome. To capture the diversity of these extreme changes, we developed and integrated approaches to identify expression, allele-specific expression, and alternative splicing outliers, and characterized the RV landscape underlying each outlier signal. We demonstrate that personal genome interpretation and RV discovery is enhanced by using these signals. This approach provides a new means to integrate a richer set of functional RVs into models of genetic burden, improve disease gene identification, and enable the delivery of precision genomics. ![Figure][2] Transcriptomic signatures identify functional rare genetic variation. We identified genes in individuals that show outlier expression, allele-specific expression, or alternative splicing and assessed enrichment of nearby rare variation. We integrated these three outlier signals with genomic annotation data to prioritize functional RVs and to intersect those variants with disease loci to identify potential RV trait associations. Rare genetic variants are abundant across the human genome, and identifying their function and phenotypic impact is a major challenge. Measuring aberrant gene expression has aided in identifying functional, large-effect rare variants (RVs). Here, we expanded detection of genetically driven transcriptome abnormalities by analyzing gene expression, allele-specific expression, and alternative splicing from multitissue RNA-sequencing data, and demonstrate that each signal informs unique classes of RVs. We developed Watershed, a probabilistic model that integrates multiple genomic and transcriptomic signals to predict variant function, validated these predictions in additional cohorts and through experimental assays, and used them to assess RVs in the UK Biobank, the Million Veterans Program, and the Jackson Heart Study. Our results link thousands of RVs to diverse molecular effects and provide evidence to associate RVs affecting the transcriptome with human traits. [1]: /lookup/doi/10.1126/science.aaz5900 [2]: pending:yes


Cell type-specific genetic regulation of gene expression across human tissues

Science

Understanding how human genetic variation affects phenotype requires tissue- or even cell type–specific measurements. Kim-Hellmuth et al. used computational methods to identify cell-type proportions within bulk tissues in the Genotype-Tissue Expression (GTEx) project dataset to identify cell-type interaction quantitative trait loci and map these to genetic variants correlated with expression or splicing differences between individuals. By characterizing the cellular context, this study illustrates how genetic variants that operate in a cell type–specific manner affect gene regulation and can be linked to complex traits. This deconvolution and analysis of cell types from bulk tissues allows greater precision in understanding how phenotypes are linked to genetic variation. Science , this issue p. [eaaz8528][1] ### INTRODUCTION Efforts to map quantitative trait loci (QTLs) across human tissues by the GTEx Consortium and others have identified expression and splicing QTLs (eQTLs and sQTLs, respectively) for a majority of genes. However, these studies were largely performed with gene expression measurements from bulk tissue samples, thus obscuring the cellular specificity of genetic regulatory effects and in turn limiting their functional interpretation. Identifying the cell type (or types) in which a QTL is active will be key to uncovering the molecular mechanisms that underlie complex trait variation. Recent studies demonstrated the feasibility of identifying cell type–specific QTLs from bulk tissue RNA-sequencing data by using computational estimates of cell type proportions. To date, such approaches have only been applied to a limited number of cell types and tissues. By applying this methodology to GTEx tissues for a diverse set of cell types, we aim to characterize the cellular specificity of genetic effects across human tissues and to describe the contribution of these effects to complex traits. ### RATIONALE A growing number of in silico cell type deconvolution methods and associated reference panels with cell type–specific marker genes enable the robust estimation of the enrichment of specific cell types from bulk tissue gene expression data. We benchmarked and used enrichment estimates for seven cell types (adipocytes, epithelial cells, hepatocytes, keratinocytes, myocytes, neurons, and neutrophils) across 35 tissues from the GTEx project to map QTLs that are specific to at least one cell type. We mapped such cell type–interaction QTLs for expression and splicing (ieQTLs and isQTLs, respectively) by testing for interactions between genotype and cell type enrichment. ### RESULTS Using 43 pairs of tissues and cell types, we found 3347 protein-coding and long intergenic noncoding RNA (lincRNA) genes with an ieQTL and 987 genes with an isQTL (at 5% false discovery rate in each pair). To validate these findings, we tested the QTLs for replication in available external datasets and applied an independent validation using allele-specific expression from eQTL heterozygotes. We analyzed the cell type–interaction QTLs for patterns of tissue sharing and found that ieQTLs are enriched for genes with tissue-specific eQTLs and are generally not shared across unrelated tissues, suggesting that tissue-specific eQTLs originate in tissue-specific cell types. Last, we tested the ieQTLs and isQTLs for colocalization with genetic associations for 87 complex traits. We show that cell type–interaction QTLs are enriched for complex trait associations and identify colocalizations for hundreds of loci that were undetected in bulk tissue, corresponding to an increase of >50% over colocalizations with standard QTLs. Our results also reveal the cellular specificity and potential origin for a similar number of colocalized standard QTLs. ### CONCLUSION The ieQTLs and isQTLs identified for seven cell types across GTEx tissues suggest that the large majority of cell type–specific QTLs remains to be discovered. Our colocalization results indicate that comprehensive mapping of cell type–specific QTLs will be highly valuable for gaining a mechanistic understanding of complex trait associations. We anticipate that the approaches presented here will complement studies mapping QTLs in single cells. ![Figure][2] Detection of cell type–specific effects on gene expression. The enrichment of seven cell types is calculated across GTEx tissues, enabling mapping of cell type–interaction QTLs for expression and splicing by testing for significant interactions between genotypes and cell type enrichments. Linking these QTLs to complex trait associations enables discovery of >50% more colocalizations compared with standard QTLs and reveals the cellular specificity of traits. The Genotype-Tissue Expression (GTEx) project has identified expression and splicing quantitative trait loci in cis (QTLs) for the majority of genes across a wide range of human tissues. However, the functional characterization of these QTLs has been limited by the heterogeneous cellular composition of GTEx tissue samples. We mapped interactions between computational estimates of cell type abundance and genotype to identify cell type–interaction QTLs for seven cell types and show that cell type–interaction expression QTLs (eQTLs) provide finer resolution to tissue specificity than bulk tissue cis-eQTLs. Analyses of genetic associations with 87 complex traits show a contribution from cell type–interaction QTLs and enables the discovery of hundreds of previously unidentified colocalized loci that are masked in bulk tissue. [1]: /lookup/doi/10.1126/science.aaz8528 [2]: pending:yes


Determinants of telomere length across human tissues

Science

Telomeres are DNA-protein complexes that protect chromosome ends. Their length is of great interest because short telomeres are associated with specific diseases and with aging. Demanelis et al. measured telomere length from 952 Genotype-Tissue Expression (GTEx) project donors across tissues, of which 24 tissue types have measurements for more than 25 samples. This dataset shows that telomere length is not constant but is correlated across tissues. Most tissue telomeres shorten with age, but some, such as those in the testis and cerebellum, do not. In African Americans, telomeres are longer on average than those from individuals of primarily European descent across many tissue types. This observation is consistent with variability being passed from germ cells to zygote to differentiated cells during development. Science , this issue p. [eaaz6876][1] ### INTRODUCTION Telomeres are DNA-protein complexes located at the end of chromosomes that protect chromosome ends from degradation and fusion. The DNA component of telomeres shortens with each cell division, eventually triggering cellular senescence. Telomere length (TL) in blood cells has been studied extensively as a biomarker of human aging and risk factor for age-related diseases. The extent to which TL in whole blood reflects TL in disease-relevant tissue types is unknown, and the variability in TL across human tissues has not been well characterized. The postmortem tissue samples collected by the Genotype-Tissue Expression (GTEx) project provide an opportunity to study TL in many human tissue types, and accompanying data on inherited genetic variation, gene expression, and donor characteristics enable us to examine demographic, genetic, and biologic determinants and correlates of TL within and across tissue types. ### RATIONALE To better understand variation in and determinants of TL, we measured relative TL (RTL, telomere repeat abundance in a DNA sample relative to a standard sample) in more than 25 tissue types from 952 GTEx donors (deceased, aged 20 to 70 years old). RTL was measured for 6391 unique tissue samples using a Luminex assay, generating the largest publicly available multitissue TL dataset. We integrated our RTL measurements with data on GTEx donor characteristics, inherited genetic variation, and tissue-specific expression and analyzed relationships between RTL and covariates using linear mixed models (across all tissues and within tissues). Through this analysis, we sought to accomplish four goals: (i) characterize sources of variation in TL, (ii) evaluate whole-blood TL as a proxy for TL in other tissue types, (iii) examine the relationship between age and TL across tissue types, and (iv) describe biological determinants and correlates of TL. ### RESULTS Variation in RTL was attributable to tissue type, donor, and age and, to a lesser extent, race or ethnicity, smoking, and inherited variants known to affect leukocyte TL. RTLs were generally positively correlated among tissues, and whole-blood RTL was a proxy for RTL in most tissues. RTL varied across tissue types and was shortest in whole blood and longest in testis. RTL was inversely associated with age in most tissues, and this association was strongest for tissues with shorter average RTL. African ancestry was associated with longer RTL across all tissues and within specific tissue types, suggesting that ancestry-based differences in TL exist in germ cells and are transmitted to the zygote. A polygenic score consisting of inherited variants known to affect leukocyte TL was associated with RTL across all tissues, and several of these TL-associated variants affected expression of nearby genes in multiple tissue types. Carriers of rare, loss-of-function variants in TL-maintenance genes had shorter RTL (based on analysis of multiple tissue types), suggesting that these variants may contribute to shorter TL in individuals from the general population. Components of telomerase, a TL maintenance enzyme, were more highly expressed in testis than in any other tissue. We found evidence that RTL may mediate the effect of age on gene expression in human tissues. ### CONCLUSION We have characterized the variability in TL across many human tissue types and the contributions of aging, ancestry, genetic variation, and other biologic processes to this variability. The correlation observed among TL measures from different tissues highlights the existence of host factors with effects on TL that are shared across tissue types (e.g., TL in the zygote). These results have important implications for the interpretation of epidemiologic studies of leukocyte TL and disease. ![Figure][2] TL in human tissues. Using a Luminex-based assay, TL was measured in DNA samples from >25 different human tissue types from 952 deceased donors in the GTEx project. TL within tissue types is determined by numerous factors, including zygotic TL, age, and exposures. TL differs across tissues and correlates among tissue types. TL in most tissues declines with age. Telomere shortening is a hallmark of aging. Telomere length (TL) in blood cells has been studied extensively as a biomarker of human aging and disease; however, little is known regarding variability in TL in nonblood, disease-relevant tissue types. Here, we characterize variability in TLs from 6391 tissue samples, representing >20 tissue types and 952 individuals from the Genotype-Tissue Expression (GTEx) project. We describe differences across tissue types, positive correlation among tissue types, and associations with age and ancestry. We show that genetic variation affects TL in multiple tissue types and that TL may mediate the effect of age on gene expression. Our results provide the foundational knowledge regarding TL in healthy tissues that is needed to interpret epidemiological studies of TL and human health. [1]: /lookup/doi/10.1126/science.aaz6876 [2]: pending:yes


The impact of sex on gene expression across human tissues

Science

In humans, the inheritance of the XX or XY set of sex chromosomes is responsible for most individuals developing into adults expressing male or female sex-specific traits. However, the degree to which sex-biased gene expression occurs in tissues, especially those that do not contribute to characteristic sexually dimorphic traits. is unknown. Oliva et al. examined Genotype-Tissue Expression (GTEx) project data and found that 37% of genes in at least one of the 44 tissues studied exhibit a tissue-specific, sex-biased gene expression. They also identified a sex-specific variation in cellular composition across tissues. Overall, the effects of sex on gene expression were small, but they were genome-wide and mostly mediated through transcription factor binding. With sex-biased gene expression associated with loci identified in genome-wide association studies, this study lays the groundwork for identifying the molecular basis of male- and female-based diseases. Science , this issue p. [eaba3066][1] ### INTRODUCTION Many complex human phenotypes, including diseases, exhibit sex-differentiated characteristics. These sex differences have been variously attributed to hormones, sex chromosomes, genotype × sex effects, differences in behavior, and differences in environmental exposures; however, their mechanisms and underlying biology remain largely unknown. The Genotype-Tissue Expression (GTEx) project provides an opportunity to investigate the prevalence and genetic mechanisms of sex differences in the human transcriptome by surveying many tissues that have not previously been characterized in this manner. ### RATIONALE To characterize sex differences in the human transcriptome and its regulation, and to discover how sex and genetics interact to influence complex traits and disease, we generated a catalog of sex differences in gene expression and its genetic regulation across 44 human tissue sources surveyed by the GTEx project (v8 data release), analyzing 16,245 RNA-sequencing samples and genotypes of 838 adult individuals. We report sex differences in gene expression levels, tissue cell type composition, and cis expression quantitative trait loci (cis-eQTLs). To assess their impact, we integrated these results with gene function, transcription factor binding annotation, and genome-wide association study (GWAS) summary statistics of 87 GWASs. ### RESULTS Sex effects on gene expression are ubiquitous (13,294 sex-biased genes across all tissues). However, these effects are small and largely tissue-specific. Genes with sex-differentiated expression are not primarily driven by tissue-specific gene expression and are involved in a diverse set of biological functions, such as drug and hormone response, embryonic development and tissue morphogenesis, fertilization, sexual reproduction and spermatogenesis, fat metabolism, cancer, and immune response. Whereas X-linked genes with higher expression in females suggest candidates for escape from X-chromosome inactivation, sex-biased expression of autosomal genes suggests hormone-related transcription factor regulation and a role for additional transcription factors, as well as sex-differentiated distribution of epigenetic marks, particularly histone H3 Lys27 trimethylation (H3K27me3). Sex differences in the genetic regulation of gene expression are much less common (369 sex-biased eQTLs across all tissues) and are highly tissue-specific. We identified 58 gene-trait associations driven by genetic regulation of gene expression in a single sex. These include loci where sex-differentiated cell type abundances mediate genotype-phenotype associations, as well as loci where sex may play a more direct role in the underlying molecular mechanism of the association. For example, we identified a female-specific eQTL in liver for the hexokinase HKDC1 that influences glucose metabolism in pregnant females, which is subsequently reflected in the birth weight of the offspring. ### CONCLUSION By integrating sex-aware analyses of GTEx data with gene function and transcription factor binding annotations, we describe tissue-specific and tissue-shared drivers and mechanisms contributing to sex differences in the human transcriptome and eQTLs. We discovered multiple sex-differentiated genetic effects on gene expression that colocalize with complex trait genetic associations, thereby facilitating the mechanistic interpretation of GWAS signals. Because the causative tissue is unknown for many phenotypes, analysis of the diverse GTEx tissue collection can serve as a powerful resource for investigations into the basis of sex-biased traits. This work provides an extensive characterization of sex differences in the human transcriptome and its genetic regulation. ![Figure][2] Sex affects gene expression and its genetic regulation across tissues. Sex effects on gene expression were measured in 44 GTEx human tissue sources and integrated with genotypes of 838 subjects. Sex-biased expression is present in numerous biological pathways and is associated to sex-differentiated transcriptional regulation. Sex-biased expression quantitative trait loci in cis (sex-biased eQTLs) are partially mediated by cellular abundances and reveal gene-trait associations. TT, AT, and AA are genotypes for a single-nucleotide polymorphism; TF, transcription factor. Many complex human phenotypes exhibit sex-differentiated characteristics. However, the molecular mechanisms underlying these differences remain largely unknown. We generated a catalog of sex differences in gene expression and in the genetic regulation of gene expression across 44 human tissue sources surveyed by the Genotype-Tissue Expression project (GTEx, v8 release). We demonstrate that sex influences gene expression levels and cellular composition of tissue samples across the human body. A total of 37% of all genes exhibit sex-biased expression in at least one tissue. We identify cis expression quantitative trait loci (eQTLs) with sex-differentiated effects and characterize their cellular origin. By integrating sex-biased eQTLs with genome-wide association study data, we identify 58 gene-trait associations that are driven by genetic regulation of gene expression in a single sex. These findings provide an extensive characterization of sex differences in the human transcriptome and its genetic regulation. [1]: /lookup/doi/10.1126/science.aba3066 [2]: pending:yes


The Real Threat to Business Schools from Artificial Intelligence - Knowledge@Wharton

#artificialintelligence

Artificial intelligence (AI) will change the way we learn and work in the near future. Nearly 400 million workers globally will change their occupations in the next 10 years, and business schools are uniquely situated to respond to the shifts coming to the future of work. However, a recent study, "Implications of Artificial Intelligence on Business Schools and Lifelong Learning," shows that business schools remain cautious in adapting management education to address the changing needs of students, workers and organizations, writes Anne Trumbore in this opinion piece. Trumbore, one of the study's coauthors, is senior director of Wharton Online, a strategic digital learning initiative at the Wharton School of the University of Pennsylvania. In the past few weeks, COVID 19 has moved hundreds of millions of students around the globe from physical to online classes.


Large expert-curated database for benchmarking document similarity detection in biomedical literature search

#artificialintelligence

Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations.


Realizing the Potential of Data Science

Communications of the ACM

The ability to manipulate and understand data is increasingly critical to discovery and innovation. As a result, we see the emergence of a new field--data science--that focuses on the processes and systems that enable us to extract knowledge or insight from data in various forms and translate it into action. In practice, data science has evolved as an interdisciplinary field that integrates approaches from such data-analysis fields as statistics, data mining, and predictive analytics and incorporates advances in scalable computing and data management. But as a discipline, data science is only in its infancy. The challenge of developing data science in a way that achieves its full potential raises important questions for the research and education community: How can we evolve the field of data science so it supports the increasing role of data in all spheres? How do we train a workforce of professionals who can use data to its best advantage? What should we teach them? What can government agencies do to help maximize the potential of data science to drive discovery and address current and future needs for a workforce with data science expertise?


DATA SCIENTIST

#artificialintelligence

The University of Pennsylvania, the largest private employer in Philadelphia, is a world-renowned leader in education, research, and innovation. This historic, Ivy League school consistently ranks among the top 10 universities in the annual U.S. News & World Report survey. Penn has 12 highly-regarded schools that provide opportunities for undergraduate, graduate and continuing education, all influenced by Penn's distinctive interdisciplinary approach to scholarship and learning.


Open Positions Faculty Affairs & Professional Development Perelman School of Medicine at the University of Pennsylvania

#artificialintelligence

The Department of Pathology and Laboratory Medicine at the Perelman School of Medicine at the University of Pennsylvania seeks candidates for a Full, Assistant, and/or Associate Professor position in the tenure track. The successful applicant will have experience in the field of machine learning applied to image analysis, and ideally will also have either clinical training or research experience in Pathology. Responsibilities include the development of an independent research program in the area of image analysis/machine learning as applied to digital histopathology images. Opportunities for collaborative work using radiologic images via partnership with our Center for Biomedical Image Computing and Analytics in Radiology (which houses a high speed computational cluster for image analytics) are also available, and the successful candidate would be ideally be poised to work in both areas. For the higher ranks the candidate must have demonstrated experience in computational analytics using machine learning in a Biomedical setting.