Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers
Varshney, Disha, Garg, Samarth, Tyagi, Sarthak, Varshney, Deeksha, Deep, Nayan, Ekbal, Asif
–arXiv.org Artificial Intelligence
In this study, we tackle the challenging task of predicting secondary structures from protein primary sequences, a pivotal initial stride towards predicting tertiary structures, while yielding crucial insights into protein activity, relationships, and functions. Existing methods often utilize extensive sets of unlabeled amino acid sequences. However, these approaches neither explicitly capture nor harness the accessible protein 3D structural data, which is recognized as a decisive factor in dictating protein functions. To address this, we utilize protein residue graphs and introduce various forms of sequential or structural connections to capture enhanced spatial information. We adeptly combine Graph Neural Networks (GNNs) and Language Models (LMs), specifically utilizing a pre-trained transformer-based protein language model to encode amino acid sequences and employing message-passing mechanisms like GCN and R-GCN to capture geometric characteristics of protein structures. Employing convolution within a specific node's nearby region, including relations, we stack multiple con-volutional layers to efficiently learn combined insights from the protein's spatial graph, revealing intricate interconnections and dependencies in its structural To assess our model's performance, we employed the training dataset provided by NetSurfP-2.0, which outlines secondary structure in 3-and 8-states. Extensive experiments show that our proposed model, SSRGNet surpasses the baseline on f1-scores. Introduction Proteins serve as essential components within cells and are involved in various applications, spanning from therapeutics to materials. They are composed of a sequence of amino acids that fold into distinct shapes. With the development of affordable sequencing technologies [1, 2], a substantial number of novel protein sequences have been identified in recent times. However, annotating the functional properties of a newly discovered protein sequence is still a laborious and expensive process. Thus, there is a need for reliable and efficient computational methods to accurately predict and assign functions to proteins, thereby bridging the gap between sequence information and functional knowledge. The analysis of protein structure, particularly the tertiary structure, is highly significant for practical applications related to proteins, such as understanding their functions and designing drugs [3].
arXiv.org Artificial Intelligence
Nov-18-2025
- Country:
- Asia > India
- Bihar > Patna (0.04)
- Chhattisgarh > Raipur (0.04)
- Tamil Nadu > Chennai (0.04)
- Europe > Greece (0.04)
- North America > United States
- Indiana (0.04)
- Asia > India
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Learning Graphical Models
- Directed Networks > Bayesian Learning (0.67)
- Undirected Networks > Markov Models (0.46)
- Neural Networks > Deep Learning (1.00)
- Statistical Learning (1.00)
- Learning Graphical Models
- Natural Language (1.00)
- Representation & Reasoning > Uncertainty (0.93)
- Machine Learning
- Information Technology > Artificial Intelligence