Scalable Normalizing Flows Enable Boltzmann Generators for Macromolecules

Kim, Joseph C., Bloore, David, Kapoor, Karan, Feng, Jun, Hao, Ming-Hong, Wang, Mengdi

Jan-8-2024–arXiv.org Artificial Intelligence

The Boltzmann distribution of a protein provides a roadmap to all of its functional states. Normalizing flows are a promising tool for modeling this distribution, but current methods are intractable for typical pharmacological targets; they become computationally intractable due to the size of the system, heterogeneity of intramolecular potential energy, and long-range interactions. To remedy these issues, we present a novel flow architecture that utilizes split channels and gated attention to efficiently learn the conformational distribution of proteins defined by internal coordinates. We show that by utilizing a 2-Wasserstein loss, one can smooth the transition from maximum likelihood training to energy-based training, enabling the training of Boltzmann Generators for macromolecules. We evaluate our model and training strategy on villin headpiece HP35(nle-nle), a 35-residue subdomain, and protein G, a 56-residue protein. We demonstrate that standard architectures and training strategies, such as maximum likelihood alone, fail while our novel architecture and multi-stage training strategy are able to model the conformational distributions of protein G and HP35. The structural ensemble of a protein determines its functions. The probabilities of the ground and metastable states of a protein at equilibrium for a given temperature determine the interactions of the protein with other proteins, effectors, and drugs, which are keys for pharmaceutical development. However, enumeration of the equilibrium conformations and their probabilities is infeasible. Since complete knowledge is inaccessible, we must adopt a sampling approach. Conventional approaches toward sampling the equilibrium ensemble rely on Markov-chain Monte Carlo or molecular dynamics (MD). These approaches explore the local energy landscape adjacent a starting point; however, they are limited by their inability to penetrate high energy barriers. In addition, MD simulations are expensive and scale poorly with system size.

artificial intelligence, machine learning, protein, (16 more...)

arXiv.org Artificial Intelligence

Jan-8-2024

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.14)
- North America > United States (0.14)

Genre:
- Research Report (0.64)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Learning Graphical Models (0.75)
  - Neural Networks (0.93)
  - Statistical Learning (0.66)