AITopics | Yim, Jason

Collaborating Authors

Yim, Jason

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Proteina: Scaling Flow-based Protein Structure Generative Models

Geffner, Tomas, Didi, Kieran, Zhang, Zuobai, Reidenbach, Danny, Cao, Zhonglin, Yim, Jason, Geiger, Mario, Dallago, Christian, Kucukbenli, Emine, Vahdat, Arash, Kreis, Karsten

arXiv.org Artificial IntelligenceMar-1-2025

Recently, diffusion- and flow-based generative models of protein structures have emerged as a powerful tool for de novo protein design. Here, we develop Proteina, a new large-scale flow-based protein backbone generator that utilizes hierarchical fold class labels for conditioning and relies on a tailored scalable transformer architecture with up to 5x as many parameters as previous models. To meaningfully quantify performance, we introduce a new set of metrics that directly measure the distributional similarity of generated proteins with reference sets, complementing existing metrics. We further explore scaling training data to millions of synthetic protein structures and explore improved training and sampling recipes adapted to protein backbone generation. This includes fine-tuning strategies like LoRA for protein backbones, new guidance methods like classifier-free guidance and autoguidance for protein backbones, and new adjusted training objectives. Proteina achieves state-of-the-art performance on de novo protein backbone design and produces diverse and designable proteins at unprecedented length, up to 800 residues. The hierarchical conditioning offers novel control, enabling high-level secondary-structure guidance as well as low-level fold-specific generation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.0071

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

Add feedback

MotifBench: A standardized protein design benchmark for motif-scaffolding problems

Zheng, Zhuoqi, Zhang, Bo, Didi, Kieran, Yang, Kevin K., Yim, Jason, Watson, Joseph L., Chen, Hai-Feng, Trippe, Brian L.

arXiv.org Artificial IntelligenceFeb-17-2025

The motif-scaffolding problem is a central task in computational protein design: Given the coordinates of atoms in a geometry chosen to confer a desired biochemical function (a motif), the task is to identify diverse protein structures (scaffolds) that include the motif and maintain its geometry. Significant recent progress on motif-scaffolding has been made due to computational evaluation with reliable protein structure prediction and fixed-backbone sequence design methods [1-17]. However, significant variability in evaluation strategies across publications has hindered comparability of results, challenged reproducibility, and impeded robust progress. In response we introduce MotifBench, comprising (1) a precisely specified pipeline and evaluation metrics, (2) a collection of 30 benchmark problems, and (3) an implementation of this benchmark and leaderboard at github.com/blt2114/MotifBench. The MotifBench test cases are more difficult compared to earlier benchmarks (e.g. A motif-scaffolding method takes a motif as input and returns a set of putatively compatible scaffolds as output. This section details how motifs and scaffolds in MotifBench are specified, proposes metrics by which a scaffold set is evaluated, and describes how these metrics are computed. Appendix A describes considerations upon which these specifications and metrics were chosen. Motif specification (inputs): A motif is specified by the coordinates of the backbone atoms of several residues and (in some cases) the amino acid types of a subset of those residues.

artificial intelligence, machine learning, scaffold, (19 more...)

arXiv.org Artificial Intelligence

2502.12479

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Generator Matching: Generative modeling with arbitrary Markov processes

Holderrieth, Peter, Havasi, Marton, Yim, Jason, Shaul, Neta, Gat, Itai, Jaakkola, Tommi, Karrer, Brian, Chen, Ricky T. Q., Lipman, Yaron

arXiv.org Artificial IntelligenceDec-9-2024

We introduce generator matching, a modality-agnostic framework for generative modeling using arbitrary Markov processes. Generators characterize the infinitesimal evolution of a Markov process, which we leverage for generative modeling in a similar vein to flow matching: we construct conditional generators which generate single data points, then learn to approximate the marginal generator which generates the full data distribution. We show that generator matching unifies various generative modeling methods, including diffusion models, flow matching and discrete diffusion models. Furthermore, it provides the foundation to expand the design space to new and unexplored Markov processes such as jump processes. Finally, generator matching enables the construction of superpositions of Markov generative processes and enables the construction of multimodal models in a rigorous manner. We empirically validate our method on protein and image structure generation, showing that superposition with a jump process improves image generation.

artificial intelligence, machine learning, probability path, (17 more...)

arXiv.org Artificial Intelligence

2410.20587

Genre: Research Report > Promising Solution (0.67)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design

Campbell, Andrew, Yim, Jason, Barzilay, Regina, Rainforth, Tom, Jaakkola, Tommi

arXiv.org Artificial IntelligenceFeb-7-2024

Combining discrete and continuous data is an important capability for generative models. We present Discrete Flow Models (DFMs), a new flow-based model of discrete data that provides the missing link in enabling flow-based generative models to be applied to multimodal continuous and discrete data problems. Our key insight is that the discrete equivalent of continuous space flow matching can be realized using Continuous Time Markov Chains. DFMs benefit from a simple derivation that includes discrete diffusion models as a specific instance while allowing improved performance over existing diffusion-based approaches. We utilize our DFMs method to build a multimodal flow-based modeling framework. We apply this capability to the task of protein co-design, wherein we learn a model for jointly generating protein structure and sequence. Our approach achieves state-of-the-art co-design performance while allowing the same multimodal model to be used for flexible generation of the sequence or structure.

machine learning, natural language, rate matrix, (16 more...)

arXiv.org Artificial Intelligence

2402.04997

Country:

North America > United States > Massachusetts (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improved motif-scaffolding with SE(3) flow matching

Yim, Jason, Campbell, Andrew, Mathieu, Emile, Foong, Andrew Y. K., Gastegger, Michael, Jiménez-Luna, José, Lewis, Sarah, Satorras, Victor Garcia, Veeling, Bastiaan S., Noé, Frank, Barzilay, Regina, Jaakkola, Tommi S.

arXiv.org Machine LearningJan-8-2024

Protein design often begins with knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a diverse range of motifs. However, the generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend FrameFlow, an SE(3) flow matching model for protein backbone generation, to perform motif-scaffolding with two complementary approaches. The first is motif amortization, in which FrameFlow is trained with the motif as input using a data augmentation strategy. The second is motif guidance, which performs scaffolding using an estimate of the conditional score from FrameFlow, and requires no additional training. Both approaches achieve an equivalent or higher success rate than previous state-of-the-art methods, with 2.5 times more structurally diverse scaffolds. Code: https://github.com/ microsoft/frame-flow.

artificial intelligence, machine learning, scaffold, (18 more...)

arXiv.org Machine Learning

2401.04082

Country:

North America > United States (0.46)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Fast non-autoregressive inverse folding with discrete diffusion

Yang, John J., Yim, Jason, Barzilay, Regina, Jaakkola, Tommi

arXiv.org Machine LearningDec-4-2023

Generating protein sequences that fold into a intended 3D structure is a fundamental step in de novo protein design. De facto methods utilize autoregressive generation, but this eschews higher order interactions that could be exploited to improve inference speed. We describe a non-autoregressive alternative that performs inference using a constant number of calls resulting in a 23 times speed up without a loss in performance on the CATH benchmark. Conditioned on the 3D structure, we fine-tune ProteinMPNN to perform discrete diffusion with a purity prior over the index sampling order. Our approach gives the flexibility in trading off inference speed and accuracy by modulating the diffusion speed.

artificial intelligence, diffusion, machine learning, (15 more...)

arXiv.org Machine Learning

2312.02447

Country: North America > United States > Massachusetts (0.16)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.69)

Add feedback

Optimizing protein fitness using Gibbs sampling with Graph-based Smoothing

Kirjner, Andrew, Yim, Jason, Samusevich, Raman, Jaakkola, Tommi, Barzilay, Regina, Fiete, Ila

arXiv.org Artificial IntelligenceJul-2-2023

The ability to design novel proteins with higher fitness on a given task would be revolutionary for many fields of medicine. However, brute-force search through the combinatorially large space of sequences is infeasible. Prior methods constrain search to a small mutational radius from a reference sequence, but such heuristics drastically limit the design space. Our work seeks to remove the restriction on mutational distance while enabling efficient exploration. We propose Gibbs sampling with Graph-based Smoothing (GGS) which iteratively applies Gibbs with gradients to propose advantageous mutations using graph-based smoothing to remove noisy gradients that lead to false positives. Our method is state-of-the-art in discovering high-fitness proteins with up to 8 mutations from the training set. We study the GFP and AAV design problems, ablations, and baselines to elucidate the results.

artificial intelligence, machine learning, sequence, (17 more...)

arXiv.org Artificial Intelligence

2307.00494

Country: North America > United States > Massachusetts (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Add feedback

SE(3) diffusion model with application to protein backbone generation

Yim, Jason, Trippe, Brian L., De Bortoli, Valentin, Mathieu, Emile, Doucet, Arnaud, Barzilay, Regina, Jaakkola, Tommi

arXiv.org Artificial IntelligenceMay-22-2023

The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for learning the SE(3) equivariant score over multiple frames. We apply FrameDiff on monomer backbone generation and find it can generate designable monomers up to 500 amino acids without relying on a pretrained protein structure prediction network that has been integral to previous methods. We find our samples are capable of generalizing beyond any known protein structure.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2302.02277

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)
Health & Medicine > Therapeutic Area > Immunology (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem

Trippe, Brian L., Yim, Jason, Tischer, Doug, Baker, David, Broderick, Tamara, Barzilay, Regina, Jaakkola, Tommi

arXiv.org Artificial IntelligenceMar-19-2023

Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds. We propose to learn a distribution over diverse and longer protein backbone structures via an E(3)-equivariant graph neural network. We develop SMCDiff to efficiently sample scaffolds from this distribution conditioned on a given motif; our algorithm is the first to theoretically guarantee conditional samples from a diffusion model in the large-compute limit. We evaluate our designed backbones by how well they align with AlphaFold2-predicted structures. We show that our method can (1) sample scaffolds up to 80 residues and (2) achieve structurally diverse scaffolds for a fixed motif.

artificial intelligence, machine learning, scaffold, (20 more...)

arXiv.org Artificial Intelligence

2206.04119

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback