Collaborating Authors

Generative modeling for protein structures

Neural Information Processing Systems

Analyzing the structure and function of proteins is a key part of understanding biology at the molecular and cellular level. In addition, a major engineering challenge is to design new proteins in a principled and methodical way. Current computational modeling methods for protein design are slow and often require human oversight and intervention. Here, we apply Generative Adversarial Networks (GANs) to the task of generating protein structures, toward application in fast de novo protein design. We encode protein structures in terms of pairwise distances between alpha-carbons on the protein backbone, which eliminates the need for the generative model to learn translational and rotational symmetries.

Global analysis of protein folding using massively parallel design, synthesis, and testing


Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Although these forces are "encoded" in the thousands of known protein structures, "decoding" them is challenging because of the complexity of natural proteins that have evolved for function, not stability. We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1000 natural proteins, 10,000 point mutants, and 30,000 negative control sequences. This analysis identified more than 2500 stable designed proteins in four basic folds--a number sufficient to enable us to systematically examine how sequence determines folding and stability in uncharted protein space. Iteration between design and experiment increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized.

Researchers Use Music To Study Proteins And Design New Ones


Researchers at MIT created an app to turn proteins into music, to access complicated protein information in a new way. Based on these protein-derived musical pieces, they trained neural networks to create new music, which could then be turned into protein structures. MIT researchers developed an app that can turn a protein's amino acid sequence into music. More than just an amusing activity, this music can train neural networks to create entirely new proteins. Understanding the exact correlation between protein structure and function is an ongoing research question in biochemistry.

GANs for Driving Biological Research


I divide the contribution of GANs to scientific research into 3 major categories: preparation, direction, and modeling. Preparing samples to study is probably one of the most unattractive parts of biological research. GANs are useful when we have some data, and we want to convert that into an image (when we input data, we use conditional GANs, or cGANS). Thus, we need to analyze where we might need to produce an image in the preparation phase. One paper [1] has used cGANs as a method for normalizing stained tissue cells for computational analysis.

Protein Design by Provable Algorithms

Communications of the ACM

Proteins are a class of large molecules that are involved in the vast majority of biological functions, from cell replication to photosynthesis to cognition. The chemical structure of proteins is very systematic5--they consist of a chain of atoms known as the backbone, which consists of three-atom (nitrogen-carbon-carbon) repeats known as residues, each of which features a sidechain of atoms emanating from the first carbon. In general, there are 20 different options for sidechains, and a residue with a particular type of sidechain is known as an amino acid (so there are also 20 different amino acid types). For billions of years, the process of evolution has optimized the sequence of amino acids that make up naturally occurring proteins to suit the needs of the organisms that make them. So we ask: Can we use computation to design non-naturally occurring proteins that suit our biomedical and industrial needs? This question is a combinatorial optimization problem, because the output of a protein design computation is a sequence of amino acids. Due to the vast diversity of naturally occurring proteins, it is possible--and very useful--to begin a protein design computation with a naturally occurring protein and then to modify it to achieve the desired function. In this article, we focus on protein design algorithms that perform this optimization using detailed modeling of the 3D structure of the protein.5,8 Thus, they will begin with a starting structure, a 3D structure of a (typically naturally occurring) protein we wish to modify. To illustrate this concept, imagine we wish to perform a simple example modification to a protein to make it more stable, so it can still function at higher temperatures.