This Perspective provides examples of current and future applications of deep learning in pharmacogenomics, including: (1) identification of novel regulatory variants located in noncoding domains and their function as applied to pharmacoepigenomics; (2) patient stratification from medical records; and (3) prediction of drugs, targets, and their interactions. Deep learning encapsulates a family of machine learning algorithms that over the last decade has transformed many important subfields of artificial intelligence (AI) and has demonstrated breakthrough performance improvements on a wide range of tasks in biomedicine. We anticipate that in the future deep learning will be widely used to predict personalized drug response and optimize medication selection and dosing, using knowledge extracted from large and complex molecular, epidemiological, clinical, and demographic datasets.
Cancer is driven by small changes in genomes that provide cells with evolutionary advantages. Many other factors contribute to the complexity of cancer, including diversity of cancers across anatomical sites, differences among cells within individual tumours, and genetic and environmental factors. The activities of genes, transcripts, and proteins in many cancer types are now systematically mapped in massive international efforts. We need to carefully analyse these complex datasets to better understand the basic biology of cancer and its driver mechanisms, treatment opportunities, and biomarkers. We are part of the Ontario Institute for Cancer Research (OICR), a top-ranking translational cancer research institute located in the Discovery District of downtown Toronto, one of the top three biotechnology hubs in North America.
Note: MF Matrix factorization; BMF Bayesian matrix factorization; KBMF Kernel Bayesian matrix factorization; KRR Kernel ridge regression; NBR Network based regression; NBC Network based classification; CV Cross validation; LOOCV Leave-one-out cross validation; PCC Pearson correlation coefficient; RMSE Root mean square error; MSE Mean square error; SCC Spearman correlation coefficient; NDCG Normalized discounted cumulative gain; R2 Coefficient of determination; NRMSE Normalized root mean squared error; AUC Area under curve; PPI Protein–protein interaction.
The Cancer Genome Atlas (TCGA) provides a high-quality resource of molecular data on a large variety of human cancers. Corces et al. used a recently modified assay to profile chromatin accessibility to determine the accessible chromatin landscape in 410 TCGA samples from 23 cancer types (see the Perspective by Taipale). When the data were integrated with other omics data available for the same tumor samples, inherited risk loci for cancer predisposition were revealed, transcription factors and enhancers driving molecular subtypes of cancer with patient survival differences were identified, and noncoding mutations associated with clinical prognosis were discovered. Science, this issue p. eaav1898; see also p. 401 Cancer is one of the leading causes of death worldwide. Although the 2% of the human genome that encodes proteins has been extensively studied, much remains to be learned about the noncoding genome and gene regulation in cancer. Genes are turned on and off in the proper cell types and cell states by transcription factor (TF) proteins acting on DNA regulatory elements that are scattered over the vast noncoding genome and exert long-range influences. The Cancer Genome Atlas (TCGA) is a global consortium that aims to accelerate the understanding of the molecular basis of cancer. TCGA has systematically collected DNA mutation, methylation, RNA expression, and other comprehensive datasets from primary human cancer tissue. TCGA has served as an invaluable resource for the identification of genomic aberrations, altered transcriptional networks, and cancer subtypes. Nonetheless, the gene regulatory landscapes of these tumors have largely been inferred through indirect means. A hallmark of active DNA regulatory elements is chromatin accessibility. Eukaryotic genomes are compacted in chromatin, a complex of DNA and proteins, and only the active regulatory elements are accessible by the cell's machinery such as TFs. ATAC-seq enables the genome-wide profiling of TF binding events that orchestrate gene expression programs and give a cell its identity. We generated high-quality ATAC-seq data in 410 tumor samples from TCGA, identifying diverse regulatory landscapes across 23 cancer types. These chromatin accessibility profiles identify cancer- and tissue-specific DNA regulatory elements that enable classification of tumor subtypes with newly recognized prognostic importance. We identify distinct TF activities in cancer based on differences in the inferred patterns of TF-DNA interaction and gene expression. Genome-wide correlation of gene expression and chromatin accessibility predicts tens of thousands of putative interactions between distal regulatory elements and gene promoters, including key oncogenes and targets in cancer immunotherapy, such as MYC, SRC, BCL2, and PDL1.