pseudoknot
Kirigami: large convolutional kernels improve deep learning-based RNA secondary structure prediction
We introduce a novel fully convolutional neural network (FCN) architecture for predicting the secondary structure of ribonucleic acid (RNA) molecules. Interpreting RNA structures as weighted graphs, we employ deep learning to estimate the probability of base pairing between nucleotide residues. Unique to our model are its massive 11-pixel kernels, which we argue provide a distinct advantage for FCNs on the specialized domain of RNA secondary structures. On a widely adopted, standardized test set comprised of 1,305 molecules, the accuracy of our method exceeds that of current state-of-the-art (SOTA) secondary structure prediction software, achieving a Matthews Correlation Coefficient (MCC) over 11-40% higher than that of other leading methods on overall structures and 58-400% higher on pseudoknots specifically.
- Europe > Austria > Vienna (0.05)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
{\alpha}-HMM: A Graphical Model for RNA Folding
Zhang, Sixiang, Yang, Aaron J., Cai, Liming
The secondary structure of a ribonucleic acid (RNA) is higher order structure over the primary sequence of the molecule. Nucleotides on the sequence physically come close to each other through hydrogen bonds between bases, forming canonical Watson-Crick pairs (A-U and G-C), and the wobble pair (G-U) as the fundamental components of the structure. The secondary structure is the intermediate, to a great extent the scaffold for higher order interactions between nucleotides to generate RNA tertiary, i.e., 3-dimensional, structure [7, 17]. The latter determines important RNA functions in biological processes, not only as a genetic information carrier but also playing catalytic, scaffolding, structural, and regulatory roles [12, 4]. There has been abundant interest in understanding the detailed process and dynamics of how RNA folds into its structure [3]. Computational prediction of RNA secondary structure directly from its primary sequence is a very desirable step toward the prediction of RNA 3D structure. This is evident by the RNA Puzzles, an annual competition to predict RNA 3D structures, in which most of the used methods by participants are proceeded by a phase for secondary structure prediction [24, 22, 23]. These authors contributed equally to this work.
- North America > United States > Georgia > Clarke County > Athens (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (3 more...)
Machine Learning Tool May Help Us Better Understand RNA Viruses
Although the model has yet to be used in real-life applications, in research testing it has shown at least a 10 percent improvement in structure prediction accuracy compared to previous state-of-the-art methods according to Xinshi Chen, a Georgia Tech Ph.D. student specializing in machine learning and co-developer of the new tool. "The model uses an unrolled algorithm for solving a constrained optimization as a component in the neural network architecture, so that it can directly incorporate a solution constraint, or prior knowledge, to predict the RNA base-pairing matrix," said Chen. E2Efold is not only more accurate, it is also considerably faster than current techniques. Current methods are dynamic programming based, which is a much slower approach for predicting longer RNA sequences, such as the genomic RNA in a virus. E2Efold overcomes this drawback by using a gradient-based unrolled algorithm.
Machine learning tool may help us better understand RNA viruses
E2Efold is an end-to-end deep learning model developed at Georgia Tech that can predict RNA secondary structures, an important task used in virus analysis, drug design, and other public health applications. Although the model has yet to be used in real-life applications, in research testing it has shown at least a 10 percent improvement in structure prediction accuracy compared to previous state-of-the-art methods according to Xinshi Chen, a Georgia Tech Ph.D. student specializing in machine learning and co-developer of the new tool. "The model uses an unrolled algorithm for solving a constrained optimization as a component in the neural network architecture, so that it can directly incorporate a solution constraint, or prior knowledge, to predict the RNA base-pairing matrix," said Chen. E2Efold is not only more accurate, it is also considerably faster than current techniques. Current methods are dynamic programming based, which is a much slower approach for predicting longer RNA sequences, such as the genomic RNA in a virus.
RNA Secondary Structure Prediction By Learning Unrolled Algorithms
Chen, Xinshi, Li, Yu, Umarov, Ramzan, Gao, Xin, Song, Le
In this paper, we propose an end-to-end deep learning model, called E2Efold, for RNA secondary structure prediction which can effectively take into account the inherent constraints in the problem. The key idea of E2Efold is to directly predict the RNA base-pairing matrix, and use an unrolled algorithm for constrained programming as the template for deep architectures to enforce constraints. With comprehensive experiments on benchmark datasets, we demonstrate the superior performance of E2Efold: it predicts significantly better structures compared to previous SOTA (especially for pseudoknotted structures), while being as efficient as the fastest algorithms in terms of inference time.