Diffusion Language Models Are Versatile Protein Learners

Wang, Xinyou, Zheng, Zaixiang, Ye, Fei, Xue, Dongyu, Huang, Shujian, Gu, Quanquan

arXiv.org Artificial Intelligence 

Drawing inspiration from the remarkable This paper introduces diffusion protein language progress in NLP achieved by language models (LMs; Devlin model (DPLM), a versatile protein language et al., 2019; Radford et al., 2018; OpenAI, 2023) thanks to model that demonstrates strong generative and the scalability of Transformers (Vaswani et al., 2017) and predictive capabilities for protein sequences. We the existence of large-scale text data, recent explorations in first pre-train scalable DPLMs from evolutionaryscale protein has also demonstrated the impressive capabilities of protein sequences within a generative selfsupervised protein language models (Rives et al., 2019; Lin et al., 2022; discrete diffusion probabilistic framework, Hu et al., 2022), learned from the universe of evolutionaryscale which generalizes language modeling for protein sequences. As a result, protein LMs have proteins in a principled way. After pre-training, become one of the most important cornerstones in AI for DPLM exhibits the ability to generate structurally protein research, serving a pivotal role not only in predictive plausible, novel and diverse protein sequences tasks (e.g., probing functional properties, and predicting for unconditional generation. We further protein structures from single sequences without explicit demonstrate the proposed diffusion generative evolutionary homologs) but also in generative tasks (e.g., pre-training make DPLM possess a better redesigning sequences given protein backbone structures, or understanding of proteins, making it a superior synthesizing completely new protein sequences).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found