Collaborating Authors

AlphaDesign: A graph protein design method and benchmark on AlphaFoldDB Artificial Intelligence

While DeepMind has tentatively solved protein folding, its inverse problem -- protein design which predicts protein sequences from their 3D structures -- still faces significant challenges. Particularly, the lack of large-scale standardized benchmark and poor accuray hinder the research progress. In order to standardize comparisons and draw more research interest, we use AlphaFold DB, one of the world's largest protein structure databases, to establish a new graph-based benchmark -- AlphaDesign. Based on AlphaDesign, we propose a new method called ADesign to improve accuracy by introducing protein angles as new features, using a simplified graph transformer encoder (SGT), and proposing a confidence-aware protein decoder (CPD). Meanwhile, SGT and CPD also improve model efficiency by simplifying the training and testing procedures. Experiments show that ADesign significantly outperforms previous graph models, e.g., the average accuracy is improved by 8\%, and the inference speed is 40+ times faster than before.

Global analysis of protein folding using massively parallel design, synthesis, and testing


Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Although these forces are "encoded" in the thousands of known protein structures, "decoding" them is challenging because of the complexity of natural proteins that have evolved for function, not stability. We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1000 natural proteins, 10,000 point mutants, and 30,000 negative control sequences. This analysis identified more than 2500 stable designed proteins in four basic folds--a number sufficient to enable us to systematically examine how sequence determines folding and stability in uncharted protein space. Iteration between design and experiment increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized.

What Is a Transformer Model?


If you want to ride the next big wave in AI, grab a transformer. A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence. Transformer models apply an evolving set of mathematical techniques, called attention or self-attention, to detect subtle ways even distant data elements in a series influence and depend on each other. First described in a 2017 paper from Google, transformers are among the newest and one of the most powerful classes of models invented to date. They're driving a wave of advances in machine learning some have dubbed transformer AI.

PDBench: Evaluating Computational Methods for Protein Sequence Design


Proteins perform critical processes in all living systems: converting solar energy into chemical energy, replicating DNA, as the basis of highly performant materials, sensing and much more. While an incredible range of functionality has been sampled in nature, it accounts for a tiny fraction of the possible protein universe. If we could tap into this pool of unexplored protein structures, we could search for novel proteins with useful properties that we could apply to tackle the environmental and medical challenges facing humanity. This is the purpose of protein design. Sequence design is an important aspect of protein design, and many successful methods to do this have been developed.