CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex Structures for Protein-RNA Binding Affinity Prediction

Han, Rong, Liu, Xiaohong, Pan, Tong, Xu, Jing, Wang, Xiaoyu, Lan, Wuyang, Li, Zhenyu, Wang, Zixuan, Song, Jiangning, Wang, Guangyu, Chen, Ting

Aug-21-2024–arXiv.org Artificial Intelligence

Accurately measuring protein-RNA binding affinity is crucial in many biological processes and drug design. Previous computational methods for protein-RNA binding affinity prediction rely on either sequence or structure features, unable to capture the binding mechanisms comprehensively. The recent emerging pre-trained language models trained on massive unsupervised sequences of protein and RNA have shown strong representation ability for various in-domain downstream tasks, including binding site prediction. However, applying different-domain language models collaboratively for complex-level tasks remains unexplored. In this paper, we propose CoPRA to bridge pre-trained language models from different biological domains via Complex structure for Protein-RNA binding Affinity prediction. We demonstrate for the first time that cross-biological modal language models can collaborate to improve binding affinity prediction. We propose a Co-Former to combine the cross-modal sequence and structure information and a bi-scope pre-training strategy for improving Co-Former's interaction understanding. Meanwhile, we build the largest protein-RNA binding affinity dataset PRA310 for performance evaluation. We also test our model on a public dataset for mutation effect prediction. CoPRA reaches state-of-the-art performance on all the datasets. We provide extensive analyses and verify that CoPRA can (1) accurately predict the protein-RNA binding affinity; (2) understand the binding affinity change caused by mutations; and (3) benefit from scaling data and model size.

machine learning, natural language, prediction, (17 more...)

arXiv.org Artificial Intelligence

Aug-21-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China > Beijing > Beijing (0.04)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area > Genetic Disease (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning
    - Neural Networks (0.68)
    - Statistical Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found