AITopics

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
(4 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.92)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsFeb-11-2026, 09:05:25 GMT

39ca8893ea38905a9d2ffe786e85af0f-Paper-Conference.pdf

large language model, machine learning, natural language, (21 more...)

Country:

North America > Canada > Quebec > Montreal (0.14)
South America > Colombia > Meta Department > Villavicencio (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

arXiv.org Artificial IntelligenceOct-30-2025

Flows, straight but not so fast: Exploring the design space of Rectified Flows in Protein Design

Chen, Junhua, Mathis, Simon, Harris, Charles, Didi, Kieran, Lio, Pietro

Generative modeling techniques such as Diffusion and Flow Matching have achieved significant successes in generating designable and diverse protein backbones. However, many current models are computationally expensive, requiring hundreds or even thousands of function evaluations (NFEs) to yield samples of acceptable quality, which can become a bottleneck in practical design campaigns that often generate $10^4\ -\ 10^6$ designs per target. In image generation, Rectified Flows (ReFlow) can significantly reduce the required NFEs for a given target quality, but their application in protein backbone generation has been less studied. We apply ReFlow to improve the low NFE performance of pretrained SE(3) flow matching models for protein backbone generation and systematically study ReFlow design choices in the context of protein generation in data curation, training and inference time settings. In particular, we (1) show that ReFlow in the protein domain is particularly sensitive to the choice of coupling generation and annealing, (2) demonstrate how useful design choices for ReFlow in the image domain do not directly translate to better performance on proteins, and (3) make improvements to ReFlow methodology for proteins.

artificial intelligence, machine learning, reflow, (14 more...)

2510.24732

Country: Europe > United Kingdom > England (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Neural Information Processing SystemsOct-10-2025, 09:20:26 GMT

8e906a69bcdfc3b9be15995b525ddda7-Paper-Conference.pdf

algebra, backbone, vector, (17 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
(4 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.92)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsOct-9-2025, 23:38:11 GMT

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation

In proteins, the amino-acid sequences determine the interaction between protein backbones and side chains, which fold into a distribution of protein structures. Consequently, the functional properties of protein structures can be inferred from its sequence.

protein, representation, sequence, (17 more...)

Country:

North America > Canada > Quebec > Montreal (0.14)
South America > Colombia > Meta Department > Villavicencio (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

arXiv.org Machine LearningOct-8-2025

Distilled Protein Backbone Generation

Xie, Liyang, Zhang, Haoran, Wang, Zhendong, Tansey, Wesley, Zhou, Mingyuan

Diffusion- and flow-based generative models have recently demonstrated strong performance in protein backbone generation tasks, offering unprecedented capabilities for de novo protein design. However, while achieving notable performance in generation quality, these models are limited by their generating speed, often requiring hundreds of iterative steps in the reverse-diffusion process. This computational bottleneck limits their practical utility in large-scale protein discovery, where thousands to millions of candidate structures are needed. To address this challenge, we explore the techniques of score distillation, which has shown great success in reducing the number of sampling steps in the vision domain while maintaining high generation quality. However, a straightforward adaptation of these methods results in unacceptably low designability. Through extensive study, we have identified how to appropriately adapt Score identity Distillation (SiD), a state-of-the-art score distillation strategy, to train few-step protein backbone generators which significantly reduce sampling time, while maintaining comparable performance to their pretrained teacher model. In particular, multistep generation combined with inference time noise modulation is key to the success. We demonstrate that our distilled few-step generators achieve more than a 20-fold improvement in sampling speed, while achieving similar levels of designability, diversity, and novelty as the Proteina teacher model. This reduction in inference cost enables large-scale in silico protein design, thereby bringing diffusion-based models closer to real-world protein engineering applications.

designability, generator, pretrained model, (13 more...)

arXiv.org Machine Learning

2510.03095

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Europe > France (0.04)

Genre: Research Report (0.83)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Faltings, Felix, Stark, Hannes, Barzilay, Regina, Jaakkola, Tommi

ProxelGen: Generating Proteins as 3D Densities

arXiv.org Artificial IntelligenceJun-25-2025

We develop ProxelGen, a protein structure generative model that operates on 3D densities as opposed to the prevailing 3D point cloud representations. Representing proteins as voxelized densities, or proxels, enables new tasks and conditioning capabilities. We generate proteins encoded as proxels via a 3D CNN-based VAE in conjunction with a diffusion model operating on its latent space. Compared to state-of-the-art models, ProxelGen's samples achieve higher novelty, better FID scores, and the same level of designability as the training set. ProxelGen's advantages are demonstrated in a standard motif scaffolding benchmark, and we show how 3D density-based generation allows for more flexible shape conditioning.

artificial intelligence, machine learning, representation, (19 more...)

2506.1982

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceJun-3-2025

Improving Protein Sequence Design through Designability Preference Optimization

Xue, Fanglei, Kubaney, Andrew, Guo, Zhichun, Min, Joseph K., Liu, Ge, Yang, Yi, Baker, David

Protein sequence design methods have demonstrated strong performance in sequence generation for de novo protein design. However, as the training objective was sequence recovery, it does not guarantee designability--the likelihood that a designed sequence folds into the desired structure. To bridge this gap, we redefine the training objective by steering sequence generation toward high designability. To do this, we integrate Direct Preference Optimization (DPO), using AlphaFold pLDDT scores as the preference signal, which significantly improves the in silico design success rate. To further refine sequence generation at a finer, residue-level granularity, we introduce Residue-level Designability Preference Optimization (ResiDPO), which applies residue-level structural rewards and decouples optimization across residues. This enables direct improvement in designability while preserving regions that already perform well. Using a curated dataset with residue-level annotations, we fine-tune LigandMPNN with ResiDPO to obtain EnhancedMPNN, which achieves a nearly 3-fold increase in in silico design success rate (from 6.56% to 17.57%) on a challenging enzyme design benchmark.

large language model, machine learning, natural language, (18 more...)

2506.00297

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

arXiv.org Artificial IntelligenceApr-22-2025

ProtPainter: Draw or Drag Protein via Topology-guided Diffusion

Lu, Zhengxi, Cheng, Shizhuo, Jiang, Yuru, Zhang, Yan, Zhang, Min

Recent advances in protein backbone generation have achieved promising results under structural, functional, or physical constraints. However, existing methods lack the flexibility for precise topology control, limiting navigation of the backbone space. We present ProtPainter, a diffusion-based approach for generating protein backbones conditioned on 3D curves. ProtPainter follows a two-stage process: curve-based sketching and sketch-guided backbone generation. For the first stage, we propose CurveEncoder, which predicts secondary structure annotations from a curve to parametrize sketch generation. For the second stage, the sketch guides the generative process in Denoising Diffusion Probabilistic Modeling (DDPM) to generate backbones. During this process, we further introduce a fusion scheduling scheme, Helix-Gating, to control the scaling factors. To evaluate, we propose the first benchmark for topology-conditioned protein generation, introducing Protein Restoration Task and a new metric, self-consistency Topology Fitness (scTF). Experiments demonstrate ProtPainter's ability to generate topology-fit (scTF > 0.8) and designable (scTM > 0.5) backbones, with drawing and dragging tasks showcasing its flexibility and versatility.

artificial intelligence, deep learning, machine learning, (18 more...)

2504.14274

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceMar-1-2025

Proteina: Scaling Flow-based Protein Structure Generative Models

Geffner, Tomas, Didi, Kieran, Zhang, Zuobai, Reidenbach, Danny, Cao, Zhonglin, Yim, Jason, Geiger, Mario, Dallago, Christian, Kucukbenli, Emine, Vahdat, Arash, Kreis, Karsten

Recently, diffusion- and flow-based generative models of protein structures have emerged as a powerful tool for de novo protein design. Here, we develop Proteina, a new large-scale flow-based protein backbone generator that utilizes hierarchical fold class labels for conditioning and relies on a tailored scalable transformer architecture with up to 5x as many parameters as previous models. To meaningfully quantify performance, we introduce a new set of metrics that directly measure the distributional similarity of generated proteins with reference sets, complementing existing metrics. We further explore scaling training data to millions of synthetic protein structures and explore improved training and sampling recipes adapted to protein backbone generation. This includes fine-tuning strategies like LoRA for protein backbones, new guidance methods like classifier-free guidance and autoguidance for protein backbones, and new adjusted training objectives. Proteina achieves state-of-the-art performance on de novo protein backbone design and produces diverse and designable proteins at unprecedented length, up to 800 residues. The hierarchical conditioning offers novel control, enabling high-level secondary-structure guidance as well as low-level fold-specific generation.

conference paper, iclr 2025, protein, (15 more...)

2503.0071

Country: North America > United States > Massachusetts (0.04)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)