Interpretable Enzyme Function Prediction via Residue-Level Detection
Yang, Zhao, Su, Bing, Chen, Jiahao, Wen, Ji-Rong
–arXiv.org Artificial Intelligence
Predicting multiple functions labeled with Enzyme Commission (EC) numbers from the enzyme sequence is of great significance but remains a challenge due to its sparse multi-label classification nature, i.e., each enzyme is typically associated with only a few labels out of more than 6000 possible EC numbers. However, existing machine learning algorithms generally learn a fixed global representation for each enzyme to classify all functions, thereby they lack interpretability and the fine-grained information of some function-specific local residue fragments may be overwhelmed. Here we present an attention-based framework, namely ProtDETR (Protein Detection Transformer), by casting enzyme function prediction as a detection problem. It uses a set of learnable functional queries to adaptatively extract different local representations from the sequence of residue-level features for predicting different EC numbers. ProtDETR not only significantly outperforms existing deep learning-based enzyme function prediction methods, but also provides a new interpretable perspective on automatically detecting different local regions for identifying different functions through cross-attentions between queries and residue-level features. The development of genome sequencing technologies has unveiled a vast collection of protein sequences, but detailed functional annotations are only available for a very small number of them [2]. Evaluating the functions of protein sequences via wet experiments is time-consuming, labor-intensive, and expensive, underscoring the critical need for computational methods to predict protein functions. This is particularly acute in the study of enzymes, which catalyze various biological reactions and are central to understanding metabolic processes. For the most widely-used EC number classification scheme, each class of enzyme function is assigned an EC number, which is a four-level hierarchy reflecting the intricate organization of enzyme functions.
arXiv.org Artificial Intelligence
Jan-9-2025
- Country:
- Asia (0.14)
- North America > United States (0.14)
- Genre:
- Research Report (0.82)
- Industry:
- Technology: