A Multimodal Human Protein Embeddings Database: DeepDrug Protein Embeddings Bank (DPEB)
Sajol, Md Saiful Islam, Rajasekaran, Magesh, Gemeinhardt, Hayden, Bess, Adam, Alvin, Chris, Mukhopadhyay, Supratik
–arXiv.org Artificial Intelligence
Computationally predicting protein-protein interactions (PPIs) is challenging due to the lack of integrated, multimodal protein representations. DPEB is a curated collection of 22,043 human proteins that integrates four embedding types: structural (AlphaFold2), transformer-based sequence (BioEmbeddings), contextual amino acid patterns (ESM-2: Evolutionary Scale Modeling), and sequence-based n-gram statistics (ProtVec]). AlphaFold2 protein structures are available through public databases (e.g., AlphaFold2 Protein Structure Database), but the internal neural network embeddings are not. DPEB addresses this gap by providing AlphaFold2-derived embeddings for computational modeling. Our benchmark evaluations show GraphSAGE with BioEmbedding achieved the highest PPI prediction performance (87.37% AUROC, 79.16% accuracy). The framework also achieved 77.42% accuracy for enzyme classification and 86.04% accuracy for protein family classification. DPEB supports multiple graph neural network methods for PPI prediction, enabling applications in systems biology, drug target identification, pathway analysis, and disease mechanism studies.
arXiv.org Artificial Intelligence
Oct-28-2025
- Country:
- North America > United States
- Louisiana > East Baton Rouge Parish
- Baton Rouge (0.04)
- Nevada (0.04)
- South Carolina > Greenville County
- Greenville (0.04)
- Louisiana > East Baton Rouge Parish
- North America > United States
- Genre:
- Research Report (0.83)
- Industry:
- Technology: