Pre-Training Protein Encoder via Siamese Sequence-Structure Diffusion Trajectory Prediction
–Neural Information Processing Systems
Self-supervised pre-training methods on proteins have recently gained attention, with most approaches focusing on either protein sequences or structures, neglecting the exploration of their joint distribution, which is crucial for a comprehensive understanding of protein functions by integrating co-evolutionary information and structural characteristics. In this work, inspired by the success of denoising diffusion models in generative tasks, we propose the DiffPreT approach to pre-train a protein encoder by sequence-structure joint diffusion modeling. DiffPreT guides the encoder to recover the native protein sequences and structures from the perturbed ones along the joint diffusion trajectory, which acquires the joint distribution of sequences and structures. Considering the essential protein conformational variations, we enhance DiffPreT by a method called Siamese Diffusion Trajectory Prediction (SiamDiff) to capture the correlation between different conformers of a protein.
Neural Information Processing Systems
Feb-11-2025, 02:39:50 GMT
- Country:
- North America > Canada (0.28)
- Genre:
- Overview (0.46)
- Research Report (0.67)
- Industry:
- Technology: