exon
Semantically Rich Local Dataset Generation for Explainable AI in Genomics
Barbosa, Pedro, Savisaar, Rosina, Fonseca, Alcides
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms. Therefore, interpreting these models may provide novel insights into the underlying biology, supporting downstream biomedical applications. Due to their complexity, interpretable surrogate models can only be built for local explanations (e.g., a single instance). However, accomplishing this requires generating a dataset in the neighborhood of the input, which must maintain syntactic similarity to the original data while introducing semantic variability in the model's predictions. This task is challenging due to the complex sequence-to-function relationship of DNA. We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity. Our custom, domain-guided individual representation effectively constrains syntactic similarity, and we provide two alternative fitness functions that promote diversity with no computational effort. Applied to the RNA splicing domain, our approach quickly achieves good diversity and significantly outperforms a random baseline in exploring the search space, as shown by our proof-of-concept, short RNA sequence. Furthermore, we assess its generalizability and demonstrate scalability to larger sequences, resulting in a ~30% improvement over the baseline.
Probabilistic Inference of Alternative Splicing Events in Microarray Data
Alternative splicing (AS) is an important and frequent step in mammalian gene expression that allows a single gene to specify multiple products, and is crucial for the regulation of fundamental biological processes. The extent of AS regulation, and the mechanisms involved, are not well un- derstood. We have developed a custom DNA microarray platform for surveying AS levels on a large scale. We present here a generative model for the AS Array Platform (GenASAP) and demonstrate its utility for quantifying AS levels in different mouse tissues. Learning is performed using a variational expectation maximization algorithm, and the parame- ters are shown to correctly capture expected AS trends.
eSkip-Finder
During the past 10 years, antisense-mediated exon skipping has proven to be a powerful tool for correction of mRNA splicing. For example, recently FDA-approved antisense oligonucleotides, including viltolarsen, eteplirsen, golodirsen, and milasen, were developed based on exon skipping technology. A significant challenge, however, is the difficulty in selecting an optimal target sequence for exon skipping. We have developed a computational method that takes into account many parameters as well as experimental data to design highly effective ASOs for exon skipping1, and improved this frame using a machine-learning algorithm. Shuntaro Chiba and Yasushi Okuno at the Molecular Design Data Intelligence Unit, RIKEN, Dr. Yoshitsugu Aoki at the Department of Molecular Therapy, National Center of Neurology and Psychiatry, and Dr.Toshifumi Yokota at the Department of Medical Genetics, University of Alberta, Faculty of Medicine and Dentistry. 1 Echigoya Y, Mouly V, Garcia L, Yokota T, Duddy W, In Silico Screening Based on Predictive Algorithms as a Design Tool for Exon Skipping Oligonucleotides in Duchenne Muscular Dystrophy.
- North America > United States (0.61)
- North America > Canada > Alberta (0.61)
- Health & Medicine > Therapeutic Area > Neurology (0.65)
- Government > Regional Government > North America Government > United States Government > FDA (0.61)
Probabilistic Inference of Alternative Splicing Events in Microarray Data
Shai, Ofer, Frey, Brendan J., Morris, Quaid D., Pan, Qun, Misquitta, Christine, Blencowe, Benjamin J.
Alternative splicing (AS) is an important and frequent step in mammalian gene expression that allows a single gene to specify multiple products, and is crucial for the regulation of fundamental biological processes. The extent of AS regulation, and the mechanisms involved, are not well understood. We have developed a custom DNA microarray platform for surveying AS levels on a large scale. We present here a generative model for the AS Array Platform (GenASAP) and demonstrate its utility for quantifying AS levels in different mouse tissues. Learning is performed using a variational expectation maximization algorithm, and the parameters are shown to correctly capture expected AS trends. A comparison of the results obtained with a well-established but low throughput experimental method demonstrate that AS levels obtained from GenASAP are highly predictive of AS levels in mammalian tissues.
- North America > Canada > Ontario > Toronto (0.15)
- Asia > Middle East > Jordan (0.04)
Probabilistic Inference of Alternative Splicing Events in Microarray Data
Shai, Ofer, Frey, Brendan J., Morris, Quaid D., Pan, Qun, Misquitta, Christine, Blencowe, Benjamin J.
Alternative splicing (AS) is an important and frequent step in mammalian gene expression that allows a single gene to specify multiple products, and is crucial for the regulation of fundamental biological processes. The extent of AS regulation, and the mechanisms involved, are not well understood. We have developed a custom DNA microarray platform for surveying AS levels on a large scale. We present here a generative model for the AS Array Platform (GenASAP) and demonstrate its utility for quantifying AS levels in different mouse tissues. Learning is performed using a variational expectation maximization algorithm, and the parameters are shown to correctly capture expected AS trends. A comparison of the results obtained with a well-established but low throughput experimental method demonstrate that AS levels obtained from GenASAP are highly predictive of AS levels in mammalian tissues.
- North America > Canada > Ontario > Toronto (0.15)
- Asia > Middle East > Jordan (0.04)
Probabilistic Inference of Alternative Splicing Events in Microarray Data
Shai, Ofer, Frey, Brendan J., Morris, Quaid D., Pan, Qun, Misquitta, Christine, Blencowe, Benjamin J.
Alternative splicing (AS) is an important and frequent step in mammalian gene expression that allows a single gene to specify multiple products, and is crucial for the regulation of fundamental biological processes. The extent of AS regulation, and the mechanisms involved, are not well understood. We have developed a custom DNA microarray platform for surveying AS levels on a large scale. We present here a generative model for the AS Array Platform (GenASAP) and demonstrate its utility for quantifying AS levels in different mouse tissues. Learning is performed using a variational expectation maximization algorithm, and the parameters are shown to correctly capture expected AS trends. A comparison of the results obtained with a well-established but low throughput experimental method demonstrate that AS levels obtained from GenASAP are highly predictive of AS levels in mammalian tissues.
- North America > Canada > Ontario > Toronto (0.15)
- Asia > Middle East > Jordan (0.04)
Hidden Markov Models for Human Genes
Baldi, Pierre, Brunak, Søren, Chauvin, Yves, Engelbrecht, Jacob, Krogh, Anders
Human genes are not continuous but rather consist of short coding regions (exons) interspersed with highly variable non-coding regions (introns). We apply HMMs to the problem of modeling exons, introns and detecting splice sites in the human genome. Our most interesting result so far is the detection of particular oscillatory patterns, with a minimal period ofroughly 10 nucleotides, that seem to be characteristic of exon regions and may have significant biological implications.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Denmark > Capital Region > Kongens Lyngby (0.05)
- North America > United States > Minnesota (0.04)
- (3 more...)
Hidden Markov Models for Human Genes
Baldi, Pierre, Brunak, Søren, Chauvin, Yves, Engelbrecht, Jacob, Krogh, Anders
Human genes are not continuous but rather consist of short coding regions (exons) interspersed with highly variable non-coding regions (introns). We apply HMMs to the problem of modeling exons, introns and detecting splice sites in the human genome. Our most interesting result so far is the detection of particular oscillatory patterns, with a minimal period ofroughly 10 nucleotides, that seem to be characteristic of exon regions and may have significant biological implications.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Denmark > Capital Region > Kongens Lyngby (0.05)
- North America > United States > Minnesota (0.04)
- (3 more...)
Hidden Markov Models for Human Genes
Baldi, Pierre, Brunak, Søren, Chauvin, Yves, Engelbrecht, Jacob, Krogh, Anders
We apply HMMs to the problem of modeling exons, intronsand detecting splice sites in the human genome. Our most interesting result so far is the detection of particular oscillatory patterns,with a minimal period ofroughly 10 nucleotides, that seem to be characteristic of exon regions and may have significant biological implications.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Denmark > Capital Region > Kongens Lyngby (0.05)
- North America > United States > Minnesota (0.04)
- (3 more...)