Goto

Collaborating Authors

 chromatin


Simulation-based inference of yeast centromeres

Touron, Eloïse, Rodrigues, Pedro L. C., Arbel, Julyan, Varoquaux, Nelle, Arbel, Michael

arXiv.org Machine Learning

The chromatin folding and the spatial arrangement of chromosomes in the cell play a crucial role in DNA replication and genes expression. An improper chromatin folding could lead to malfunctions and, over time, diseases. For eukaryotes, centromeres are essential for proper chromosome segregation and folding. Despite extensive research using de novo sequencing of genomes and annotation analysis, centromere locations in yeasts remain difficult to infer and are still unknown in most species. Recently, genome-wide chromosome conformation capture coupled with next-generation sequencing (Hi-C) has become one of the leading methods to investigate chromosome structures. Some recent studies have used Hi-C data to give a point estimate of each centromere, but those approaches highly rely on a good pre-localization. Here, we present a novel approach that infers in a stochastic manner the locations of all centromeres in budding yeast based on both the experimental Hi-C map and simulated contact maps.


Reviews: Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin

Neural Information Processing Systems

The paper presents a novel method for predicting gene regulation by LSTM with an attention mechanism. The model consists of two levels, where the first level is applied on bins for each histone modifications (HM) and the second level is applied to multiple HMs. Attention mechanism is used in each level to focus on the important parts of the bins and HMs. In the experiments, the proposed method improves AUC scores over baseline models including CNN, LSTM, and CNN with an attention mechanism. This is an interesting paper which shows that LSTM with an attention mechanism can predict gene regulation.


Obtaining genetics insights from deep learning via explainable artificial intelligence - Nature Reviews Genetics

#artificialintelligence

Artificial intelligence (AI) models based on deep learning now represent the state of the art for making functional predictions in genomics research. However, the underlying basis on which predictive models make such predictions is often unknown. For genomics researchers, this missing explanatory information would frequently be of greater value than the predictions themselves, as it can enable new insights into genetic processes. We review progress in the emerging area of explainable AI (xAI), a field with the potential to empower life science researchers to gain mechanistic insights into complex deep learning models. We discuss and categorize approaches for model interpretation, including an intuitive understanding of how each approach works and their underlying assumptions and limitations in the context of typical high-throughput biological datasets. In this Review, the authors describe advances in deep learning approaches in genomics, whereby researchers are moving beyond the typical ‘black box’ nature of models to obtain biological insights through explainable artificial intelligence (xAI).


GPU and Machine Learning Identify Spots on DNA That Are Likely to Mutate

#artificialintelligence

Researching genomes is a laborious process that requires looking at chromatin, a mix of DNA and protein inside chromosomes. In 2013, scientists invented Assay for Transposase-Accessible Chromatin using sequencing (ATAC-Seq), a method of rooting around in chromatin to see what's going on. The problem is that ATAC-Seq takes hours and produces lots of noisy data. Even with high-precision scientific tools, folded up sequences of DNA are hard to sort through.


Nucleosome positioning: resources and tools online

Teif, Vladimir B.

arXiv.org Machine Learning

This is the author's version which is being continuously updated and not synchronised with the journal version. The final printed version will appear in Briefings in Bioinformatics Abstract Nucleosome positioning is an important process required for proper genome packing and its accessibility to execute the genetic program in a cell-specific, timely manner. In the recent years hundreds of papers have been devoted to the bioinformatics, physics and biology of nucleosome positioning. The purpose of this review is to cover a practical aspect of this field, namely to provide a guide to the multitude of nucleosome positioning resources available online. These include almost 300 experimental datasets of genome-wide nucleosome occupancy profiles determined in different cell types and more than 40 computational tools for the analysis of experimental nucleosome positioning data and prediction of intrinsic nucleosome formation probabilities from the DNA sequence. A manually curated, up to date list of these resources will be maintained at http://generegulation.info. 1 Introduction The nucleosome is the basic unit of chromatin compaction, composed of the histone octamer and 146-147 base pairs (bp) of DNA wrapped around it. Nucleosomes can form at any genomic locations, but some DNA sequences have higher affinity to the histone octamer, mostly due to the differential bending properties of the DNA double helix. In addition, nucleosome positioning is cell type-specific, in a sense that the cells of the same organism sharing the same genome can have different nucleosome locations depending on the cell type and state. Interested readers are directed to a number of recent publications reviewing the biological, physical and bioinformatics aspects of these phenomena, which will be outside of the scope of the current work [1-32]. Here we will omit fundamental scientific questions, and will focus on a very practical aspect of the field: which experimental nucleosome positioning datasets already exist, how to generate your own data, and how to compare these with other experimental datasets and bioinformatically predicted nucleosome positions in a given genome? 1. Available experimental datasets Recent high-throughput genome-wide data with respect to nucleosome positioning come from a number of related techniques, which have in common an idea to cut DNA between nucleosomes and map protected DNA regions. The most frequently used method is MNase-seq (chromatin digestion by micrococcal nuclease followed by deep sequencing) [11, 33-35].