designability
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area (0.92)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Information Technology (0.67)
- North America > Canada > Quebec > Montreal (0.14)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Flows, straight but not so fast: Exploring the design space of Rectified Flows in Protein Design
Chen, Junhua, Mathis, Simon, Harris, Charles, Didi, Kieran, Lio, Pietro
Generative modeling techniques such as Diffusion and Flow Matching have achieved significant successes in generating designable and diverse protein backbones. However, many current models are computationally expensive, requiring hundreds or even thousands of function evaluations (NFEs) to yield samples of acceptable quality, which can become a bottleneck in practical design campaigns that often generate $10^4\ -\ 10^6$ designs per target. In image generation, Rectified Flows (ReFlow) can significantly reduce the required NFEs for a given target quality, but their application in protein backbone generation has been less studied. We apply ReFlow to improve the low NFE performance of pretrained SE(3) flow matching models for protein backbone generation and systematically study ReFlow design choices in the context of protein generation in data curation, training and inference time settings. In particular, we (1) show that ReFlow in the protein domain is particularly sensitive to the choice of coupling generation and annealing, (2) demonstrate how useful design choices for ReFlow in the image domain do not directly translate to better performance on proteins, and (3) make improvements to ReFlow methodology for proteins.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Canada (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (6 more...)
- Health & Medicine > Therapeutic Area (0.92)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Information Technology (0.67)
- North America > Canada > Quebec > Montreal (0.14)
- Europe > United Kingdom (0.14)
Distilled Protein Backbone Generation
Xie, Liyang, Zhang, Haoran, Wang, Zhendong, Tansey, Wesley, Zhou, Mingyuan
Diffusion- and flow-based generative models have recently demonstrated strong performance in protein backbone generation tasks, offering unprecedented capabilities for de novo protein design. However, while achieving notable performance in generation quality, these models are limited by their generating speed, often requiring hundreds of iterative steps in the reverse-diffusion process. This computational bottleneck limits their practical utility in large-scale protein discovery, where thousands to millions of candidate structures are needed. To address this challenge, we explore the techniques of score distillation, which has shown great success in reducing the number of sampling steps in the vision domain while maintaining high generation quality. However, a straightforward adaptation of these methods results in unacceptably low designability. Through extensive study, we have identified how to appropriately adapt Score identity Distillation (SiD), a state-of-the-art score distillation strategy, to train few-step protein backbone generators which significantly reduce sampling time, while maintaining comparable performance to their pretrained teacher model. In particular, multistep generation combined with inference time noise modulation is key to the success. We demonstrate that our distilled few-step generators achieve more than a 20-fold improvement in sampling speed, while achieving similar levels of designability, diversity, and novelty as the Proteina teacher model. This reduction in inference cost enables large-scale in silico protein design, thereby bringing diffusion-based models closer to real-world protein engineering applications.
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > France (0.04)
ProxelGen: Generating Proteins as 3D Densities
Faltings, Felix, Stark, Hannes, Barzilay, Regina, Jaakkola, Tommi
We develop ProxelGen, a protein structure generative model that operates on 3D densities as opposed to the prevailing 3D point cloud representations. Representing proteins as voxelized densities, or proxels, enables new tasks and conditioning capabilities. We generate proteins encoded as proxels via a 3D CNN-based VAE in conjunction with a diffusion model operating on its latent space. Compared to state-of-the-art models, ProxelGen's samples achieve higher novelty, better FID scores, and the same level of designability as the training set. ProxelGen's advantages are demonstrated in a standard motif scaffolding benchmark, and we show how 3D density-based generation allows for more flexible shape conditioning.
Improving Protein Sequence Design through Designability Preference Optimization
Xue, Fanglei, Kubaney, Andrew, Guo, Zhichun, Min, Joseph K., Liu, Ge, Yang, Yi, Baker, David
Protein sequence design methods have demonstrated strong performance in sequence generation for de novo protein design. However, as the training objective was sequence recovery, it does not guarantee designability--the likelihood that a designed sequence folds into the desired structure. To bridge this gap, we redefine the training objective by steering sequence generation toward high designability. To do this, we integrate Direct Preference Optimization (DPO), using AlphaFold pLDDT scores as the preference signal, which significantly improves the in silico design success rate. To further refine sequence generation at a finer, residue-level granularity, we introduce Residue-level Designability Preference Optimization (ResiDPO), which applies residue-level structural rewards and decouples optimization across residues. This enables direct improvement in designability while preserving regions that already perform well. Using a curated dataset with residue-level annotations, we fine-tune LigandMPNN with ResiDPO to obtain EnhancedMPNN, which achieves a nearly 3-fold increase in in silico design success rate (from 6.56% to 17.57%) on a challenging enzyme design benchmark.
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
ProtPainter: Draw or Drag Protein via Topology-guided Diffusion
Lu, Zhengxi, Cheng, Shizhuo, Jiang, Yuru, Zhang, Yan, Zhang, Min
Recent advances in protein backbone generation have achieved promising results under structural, functional, or physical constraints. However, existing methods lack the flexibility for precise topology control, limiting navigation of the backbone space. We present ProtPainter, a diffusion-based approach for generating protein backbones conditioned on 3D curves. ProtPainter follows a two-stage process: curve-based sketching and sketch-guided backbone generation. For the first stage, we propose CurveEncoder, which predicts secondary structure annotations from a curve to parametrize sketch generation. For the second stage, the sketch guides the generative process in Denoising Diffusion Probabilistic Modeling (DDPM) to generate backbones. During this process, we further introduce a fusion scheduling scheme, Helix-Gating, to control the scaling factors. To evaluate, we propose the first benchmark for topology-conditioned protein generation, introducing Protein Restoration Task and a new metric, self-consistency Topology Fitness (scTF). Experiments demonstrate ProtPainter's ability to generate topology-fit (scTF > 0.8) and designable (scTM > 0.5) backbones, with drawing and dragging tasks showcasing its flexibility and versatility.
Proteina: Scaling Flow-based Protein Structure Generative Models
Geffner, Tomas, Didi, Kieran, Zhang, Zuobai, Reidenbach, Danny, Cao, Zhonglin, Yim, Jason, Geiger, Mario, Dallago, Christian, Kucukbenli, Emine, Vahdat, Arash, Kreis, Karsten
Recently, diffusion- and flow-based generative models of protein structures have emerged as a powerful tool for de novo protein design. Here, we develop Proteina, a new large-scale flow-based protein backbone generator that utilizes hierarchical fold class labels for conditioning and relies on a tailored scalable transformer architecture with up to 5x as many parameters as previous models. To meaningfully quantify performance, we introduce a new set of metrics that directly measure the distributional similarity of generated proteins with reference sets, complementing existing metrics. We further explore scaling training data to millions of synthetic protein structures and explore improved training and sampling recipes adapted to protein backbone generation. This includes fine-tuning strategies like LoRA for protein backbones, new guidance methods like classifier-free guidance and autoguidance for protein backbones, and new adjusted training objectives. Proteina achieves state-of-the-art performance on de novo protein backbone design and produces diverse and designable proteins at unprecedented length, up to 800 residues. The hierarchical conditioning offers novel control, enabling high-level secondary-structure guidance as well as low-level fold-specific generation.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)