Identifying DNA Sequence Motifs Using Deep Learning
Poddar, Asmita, Uzun, Vladimir, Tunbridge, Elizabeth, Haerty, Wilfried, Nevado-Holgado, Alejo
–arXiv.org Artificial Intelligence
Advancements in genomic technologies [22] have enabled the generation of vast amounts of DNA sequence data, enabling the structural and functional study of the human genome [3]. This has been valuable in providing insights into the genetic basis of diseases. Since splice sites play a crucial role in this, accurately predicting these sites in DNA sequences is essential for diagnosing diseases. The internal structure of DNA sequences consists of alternating protein-coding regions that contain information for the production of proteins called exons, and non-coding regions that interrupt the coding sequence called introns, as shown in Figure 1. Mutations in the intronic region have been associated with developmental disorders such as isolated Pierre Robin sequence [28] and several types of cancer [31]. Hence, the accurate identification of boundaries between the exons and introns, known as splice sites, has special biological significance with healthcare implications, such as to study the sources of unresolved genetic disease. Current annotations of the human genome to identify where exons/introns are located in the sequence [10] are far from complete and much remains unknown about the DNA sequence motifs that indicate the presence of a splice site. We aim to address this challenge computationally by predicting the presence of splice sites -- both the exon start (intron end) sites, known as acceptor sites, and the exon end (intron start) sites, known as donor sites -- within the DNA sequence.
arXiv.org Artificial Intelligence
Nov-20-2023
- Country:
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- North America > United States
- New York (0.04)
- Europe > United Kingdom
- Genre:
- Research Report (0.50)
- Industry: