Goto

Collaborating Authors

 nature biotechnology


Accelerating Prime Editing: Machine Learning Helps Design the Best Fix for a Given Genetic Flaw

#artificialintelligence

A new study published in the journal Nature Biotechnology has used machine learning to accelerate the development of prime editing, a promising gene-editing technology. The study analyzed thousands of DNA sequences introduced into the genome using prime editors, and used the data to train a machine learning algorithm to design the best fix for a given genetic flaw. By using machine learning to streamline the process of designing genetic fixes, this research could help speed up efforts to bring prime editing into clinical use. Researchers at the Wellcome Sanger Institute have developed a new tool to predict the chances of successfully inserting a gene-edited sequence of DNA into the genome of a cell, using a technique known as prime editing. An evolution of CRISPR-Cas9 gene editing technology, prime editing has huge potential to treat genetic diseases in humans, from cancer to cystic fibrosis.


Machine learning helps determine success of advanced genome editing

#artificialintelligence

A new tool to predict the chances of successfully inserting a gene-edited sequence of DNA into the genome of a cell, using a technique known as prime editing, has been developed by researchers at the Wellcome Sanger Institute. An evolution of CRISPR-Cas9 gene editing technology, prime editing has huge potential to treat genetic disease in humans, from cancer to cystic fibrosis. But thus far, the factors determining the success of edits are not well understood. The study, published today (February 16) in Nature Biotechnology, assessed thousands of different DNA sequences introduced into the genome using prime editors. These data were then used to train a machine learning algorithm to help researchers design the best fix for a given genetic flaw, which promises to speed up efforts to bring prime editing into the clinic.


Artificial intelligence can improve efficiency of genome editing

#artificialintelligence

Researchers at the University of Zurich have developed a new tool that uses artificial intelligence to predict the efficacy of various genome-editing repair options. Unintentional errors in the correction of DNA mutations of genetic diseases can thus be reduced. Genome editing technologies offer great opportunities for treating genetic diseases. Methods such as the widely used CRISPR/Cas9 gene scissors directly address the cause of the disease in the DNA. The scissors are used in the laboratory to make targeted modifications to the genetic material in cell lines and model organisms and to study biological processes.


Machine learning powers biobank-driven drug discovery - Nature Biotechnology

#artificialintelligence

Drug hunters are moving into the clinic with human-first โ€˜no-hypothesisโ€™ target discovery, applying the full force of machine learning to massive collections of human omics data.


Genome-wide mapping of somatic mutation rates uncovers drivers of cancer - Nature Biotechnology

#artificialintelligence

Identification of cancer driver mutations that confer a proliferative advantage is central to understanding cancer; however, searches have often been limited to protein-coding sequences and specific non-coding elements (for example, promoters) because of the challenge of modeling the highly variable somatic mutation rates observed across tumor genomes. Here we present Dig, a method to search for driver elements and mutations anywhere in the genome. We use deep neural networks to map cancer-specific mutation rates genome-wide at kilobase-scale resolution. These estimates are then refined to search for evidence of driver mutations under positive selection throughout the genome by comparing observed to expected mutation counts. We mapped mutation rates for 37 cancer types and applied these maps to identify putative drivers within intronic cryptic splice regions, 5โ€ฒ untranslated regions and infrequently mutated genes. Our high-resolution mutation rate maps, available for web-based exploration, are a resource to enable driver discovery genome-wide. Cancer driver mutations are identified by predicting neutral mutation rates across the entire genome.


Multi-omics single-cell data integration and regulatory inference with graph-linked embedding - Nature Biotechnology

#artificialintelligence

Despite the emergence of experimental methods for simultaneous measurement of multiple omics modalities in single cells, most single-cell datasets include only one modality. A major obstacle in integrating omics data from multiple modalities is that different omics layers typically have distinct feature spaces. Here, we propose a computational framework called GLUE (graph-linked unified embedding), which bridges the gap by modeling regulatory interactions across omics layers explicitly. Systematic benchmarking demonstrated that GLUE is more accurate, robust and scalable than state-of-the-art tools for heterogeneous single-cell multi-omics data. We applied GLUE to various challenging tasks, including triple-omics integration, integrative regulatory inference and multi-omics human cell atlas construction over millions of cells, where GLUE was able to correct previous annotations. GLUE features a modular design that can be flexibly extended and enhanced for new analysis tasks. The full package is available online at https://github.com/gao-lab/GLUE . Different single-cell data modalities are integrated at atlas-scale by modeling regulatory interactions.


scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning - Nature Biotechnology

#artificialintelligence

Single-cell multiomics data continues to grow at an unprecedented pace. Although several methods have demonstrated promising results in integrating several data modalities from the same tissue, the complexity and scale of data compositions present in cell atlases still pose a challenge. Here, we present scJoint, a transfer learning method to integrate atlas-scale, heterogeneous collections of scRNA-seq and scATAC-seq data. scJoint leverages information from annotated scRNA-seq data in a semisupervised framework and uses a neural network to simultaneously train labeled and unlabeled data, allowing label transfer and joint visualization in an integrative framework. Using atlas data as well as multimodal datasets generated with ASAP-seq and CITE-seq, we demonstrate that scJoint is computationally efficient and consistently achieves substantially higher cell-type label accuracy than existing methods while providing meaningful joint visualizations. Thus, scJoint overcomes the heterogeneity of different data modalities to enable a more comprehensive understanding of cellular phenotypes. Integration of data from single-cell RNA-seq and ATAC-seq is achieved with transfer learning.


Artificial intelligence tool enriches a gold-mine in cancer genomics

#artificialintelligence

The fragments of cancer DNA analyzed by the authors of this new study originate from the human genome, the sequence of which results from millions of years of evolution, and has been shaped by "copy-paste-edit" processes and co-evolution with parasitic elements. For example, 8% of our DNA comes from past viral infections. The tortuous mutational processes that have shaped our genomes intensify and become life-threatening in the genomes of cancer cells, leading to anarchic cell mutation and proliferation. The repeated sequences of DNA in our genomes are not only a fossil of our past evolution, but also hold a track record of how a cancer has evolved, which helps scientists understand and study cancer development and progression. Current technologies allow scientists to read and piece together billions of short DNA sequences to study cancer genomes and identify mutations within them.


Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer

arXiv.org Machine Learning

The emerging field of precision oncology relies on the accurate pinpointing of alterations in the molecular profile of a tumor to provide personalized targeted treatments. Current methodologies in the field commonly include the application of next generation sequencing technologies to a tumor sample, followed by the identification of mutations in the DNA known as somatic variants. The differentiation of these variants from sequencing error poses a classic classification problem, which has traditionally been approached with Bayesian statistics, and more recently with supervised machine learning methods such as neural networks. Although these methods provide greater accuracy, classic neural networks lack the ability to indicate the confidence of a variant call. In this paper, we explore the performance of deep Bayesian neural networks on next generation sequencing data, and their ability to give probability estimates for somatic variant calls. In addition to demonstrating similar performance in comparison to standard neural networks, we show that the resultant output probabilities make these better suited to the disparate and highly-variable sequencing data-sets these models are likely to encounter in the real world. We aim to deliver algorithms to oncologists for which model certainty better reflects accuracy, for improved clinical application. By moving away from point estimates to reliable confidence intervals, we expect the resultant clinical and treatment decisions to be more robust and more informed by the underlying reality of the tumor molecular profile.


Scientists develop more accurate method to find good targets for cancer immunotherapy

#artificialintelligence

Ludwig Cancer Research scientists have developed a new and more accurate method to identify the molecular signs of cancer likely to be presented to helper T cells, which stimulate and orchestrate the immune response to tumors and infectious agents. The study, led by David Gfeller and Michal Bassani-Sternberg of the Lausanne Branch of the Ludwig Institute for Cancer Research, is reported in the current issue of Nature Biotechnology. The new method combines two powerful new technologies. One is a mass spectrometry technology developed by Bassani-Sternberg's lab to rapidly and inexpensively obtain the amino acid sequences of thousands of peptide antigens--or protein fragments--bound to a molecular complex known as HLA that is expressed on the surface of cells. The other is a novel computational tool developed in Gfeller's lab that is based on machine learning, the computational approach that powers face-recognition software, among other things.