MMSite: A Multi-modal Framework for the Identification of Active Sites in Proteins
–Neural Information Processing Systems
The accurate identification of active sites in proteins is essential for the advancement of life sciences and pharmaceutical development, as these sites are of critical importance for enzyme activity and drug design. Recent advancements in protein language models (PLMs), trained on extensive datasets of amino acid sequences, have significantly improved our understanding of proteins. However, compared to the abundant protein sequence data, functional annotations, especially precise per-residue annotations, are scarce, which limits the performance of PLMs. On the other hand, textual descriptions of proteins, which could be annotated by human experts or a pretrained protein sequence-to-text model, provide meaningful context that could assist in the functional annotations, such as the localization of active sites. Based on this dataset, we propose \textbf{MMSite}, a multi-modal framework that improves the performance of PLMs to identify active sites by leveraging biomedical language models (BLMs).
Neural Information Processing Systems
May-27-2025, 01:12:55 GMT
- Industry:
- Technology: