Diffusion Language Models Are Versatile Protein Learners

Wang, Xinyou, Zheng, Zaixiang, Ye, Fei, Xue, Dongyu, Huang, Shujian, Gu, Quanquan

Feb-28-2024–arXiv.org Artificial Intelligence

Drawing inspiration from the remarkable This paper introduces diffusion protein language progress in NLP achieved by language models (LMs; Devlin model (DPLM), a versatile protein language et al., 2019; Radford et al., 2018; OpenAI, 2023) thanks to model that demonstrates strong generative and the scalability of Transformers (Vaswani et al., 2017) and predictive capabilities for protein sequences. We the existence of large-scale text data, recent explorations in first pre-train scalable DPLMs from evolutionaryscale protein has also demonstrated the impressive capabilities of protein sequences within a generative selfsupervised protein language models (Rives et al., 2019; Lin et al., 2022; discrete diffusion probabilistic framework, Hu et al., 2022), learned from the universe of evolutionaryscale which generalizes language modeling for protein sequences. As a result, protein LMs have proteins in a principled way. After pre-training, become one of the most important cornerstones in AI for DPLM exhibits the ability to generate structurally protein research, serving a pivotal role not only in predictive plausible, novel and diverse protein sequences tasks (e.g., probing functional properties, and predicting for unconditional generation. We further protein structures from single sequences without explicit demonstrate the proposed diffusion generative evolutionary homologs) but also in generative tasks (e.g., pre-training make DPLM possess a better redesigning sequences given protein backbone structures, or understanding of proteins, making it a superior synthesizing completely new protein sequences).

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Feb-28-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.34)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)
  - Representation & Reasoning > Uncertainty (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found