chemical shift
Seek and You Shall Fold
Sellam, Nadav Bojan, Bojan, Meital, Schanda, Paul, Bronstein, Alex
Accurate protein structures are essential for understanding biological function, yet incorporating experimental data into protein generative models remains a major challenge. Most predictors of experimental observables are non-differentiable, making them incompatible with gradient-based conditional sampling. This is especially limiting in nuclear magnetic resonance, where rich data such as chemical shifts are hard to directly integrate into generative modeling. We introduce a framework for non-differentiable guidance of protein generative models, coupling a continuous diffusion-based generator with any black-box objective via a tailored genetic algorithm. We demonstrate its effectiveness across three modalities: pairwise distance constraints, nuclear Overhauser effect restraints, and for the first time chemical shifts. These results establish chemical shift guided structure generation as feasible, expose key weaknesses in current predictors, and showcase a general strategy for incorporating diverse experimental signals. Our work points toward automated, data-conditioned protein modeling beyond the limits of differentiability.
- North America > United States > Michigan (0.04)
- Europe > Austria (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- (2 more...)
NMR-Solver: Automated Structure Elucidation via Large-Scale Spectral Matching and Physics-Guided Fragment Optimization
Jin, Yongqi, Wang, Jun-Jie, Xu, Fanjie, Ji, Xiaohong, Gao, Zhifeng, Zhang, Linfeng, Ke, Guolin, Zhu, Rong, E, Weinan
Nuclear Magnetic Resonance (NMR) spectroscopy is one of the most powerful and widely used tools for molecular structure elucidation in organic chemistry. However, the interpretation of NMR spectra to determine unknown molecular structures remains a labor-intensive and expertise-dependent process, particularly for complex or novel compounds. Although recent methods have been proposed for molecular structure elucidation, they often underperform in real-world applications due to inherent algorithmic limitations and limited high-quality data. Here, we present NMR-Solver, a practical and interpretable framework for the automated determination of small organic molecule structures from $^1$H and $^{13}$C NMR spectra. Our method introduces an automated framework for molecular structure elucidation, integrating large-scale spectral matching with physics-guided fragment-based optimization that exploits atomic-level structure-spectrum relationships in NMR. We evaluate NMR-Solver on simulated benchmarks, curated experimental data from the literature, and real-world experiments, demonstrating its strong generalization, robustness, and practical utility in challenging, real-life scenarios. NMR-Solver unifies computational NMR analysis, deep learning, and interpretable chemical reasoning into a coherent system. By incorporating the physical principles of NMR into molecular optimization, it enables scalable, automated, and chemically meaningful molecular identification, establishing a generalizable paradigm for solving inverse problems in molecular science.
- Asia > China > Beijing > Beijing (0.04)
- Asia > China > Fujian Province > Xiamen (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
- Materials > Chemicals (0.46)
DiffNMR: Diffusion Models for Nuclear Magnetic Resonance Spectra Elucidation
Yang, Qingsong, Wu, Binglan, Liu, Xuwei, Chen, Bo, Li, Wei, Long, Gen, Chen, Xin, Xiao, Mingjun
Nuclear Magnetic Resonance (NMR) spectroscopy is a central characterization method for molecular structure elucidation, yet interpreting NMR spectra to deduce molecular structures remains challenging due to the complexity of spectral data and the vastness of the chemical space. In this work, we introduce DiffNMR, a novel end-to-end framework that leverages a conditional discrete diffusion model for de novo molecular structure elucidation from NMR spectra. DiffNMR refines molecular graphs iteratively through a diffusion-based generative process, ensuring global consistency and mitigating error accumulation inherent in autoregressive methods. The framework integrates a two-stage pretraining strategy that aligns spectral and molecular representations via diffusion autoencoder (Diff-AE) and contrastive learning, the incorporation of retrieval initialization and similarity filtering during inference, and a specialized NMR encoder with radial basis function (RBF) encoding for chemical shifts, preserving continuity and chemical correlation. Experimental results demonstrate that DiffNMR achieves competitive performance for NMR-based structure elucidation, offering an efficient and robust solution for automated molecular analysis.
- North America > United States (0.14)
- Asia > China (0.04)
Representing local protein environments with atomistic foundation models
Bojan, Meital, Vedula, Sanketh, Maddipatla, Advaith, Sellam, Nadav Bojan, Napoli, Federico, Schanda, Paul, Bronstein, Alex M.
The local structure of a protein strongly impacts its function and interactions with other molecules. Therefore, a concise, informative representation of a local protein environment is essential for modeling and designing proteins and biomolecular interactions. However, these environments' extensive structural and chemical variability makes them challenging to model, and such representations remain under-explored. In this work, we propose a novel representation for a local protein environment derived from the intermediate features of atomistic foundation models (AFMs). We demonstrate that this embedding effectively captures both local structure (e.g., secondary motifs), and chemical features (e.g., amino-acid identity and protonation state). We further show that the AFM-derived representation space exhibits meaningful structure, enabling the construction of data-driven priors over the distribution of biomolecular environments. Finally, in the context of biomolecular NMR spectroscopy, we demonstrate that the proposed representations enable a first-of-its-kind physics-informed chemical shift predictor that achieves state-of-the-art accuracy. Our results demonstrate the surprising effectiveness of atomistic foundation models and their emergent representations for protein modeling beyond traditional molecular simulations. We believe this will open new lines of work in constructing effective functional representations for protein environments.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Kansas > Sheridan County (0.04)
- Europe > Austria (0.04)
- (2 more...)
AI-enabled prediction of NMR spectroscopy: Deducing 2-D NMR of carbohydrate
Li, Yunrui, Xu, Hao, Hong, Pengyu
In the dynamic field of nuclear magnetic resonance (NMR) spectroscopy, artificial intelligence (AI) has ushered in a transformative era for molecular studies. AI-driven NMR prediction, powered by advanced machine learning and predictive algorithms, has fundamentally reshaped the interpretation of NMR spectra. This innovation empowers us to forecast spectral patterns swiftly and accurately across a broad spectrum of molecular structures. Furthermore, the advent of generative modeling offers a groundbreaking approach, making it feasible to make informed prediction of 2D NMR from chemical language (such as SMILES, IUPAC Name). Our method mirrors the multifaceted nature of NMR imaging experiments, producing 2D NMRs for the same molecule based on different conditions, such as solvents and temperatures. Our methodology is versatile, catering to both monosaccharide-derived small molecules, oligosaccharides and large polysaccharides. A deeper exploration of the discrepancies in these predictions can provide insights into the influence of elements such as functional groups, repeating units, and the modification of the monomers on the outcomes. Given the complex nature involved in the generation of 2D NMRs, our objective is to fully leverage the potential of AI to enhance the precision, efficiency, and comprehensibility of NMR spectral analysis, ultimately advancing both the field of NMR spectroscopy and the broader realm of molecular research.
- Research Report > Promising Solution (0.48)
- Overview > Innovation (0.34)
GlycoNMR: Dataset and benchmarks for NMR chemical shift prediction of carbohydrates with graph neural networks
Chen, Zizhang, Badman, Ryan Paul, Foley, Lachele, Woods, Robert, Hong, Pengyu
Molecular representation learning (MRL) is a powerful tool for bridging the gap between machine learning and chemical sciences, as it converts molecules into numerical representations while preserving their chemical features. These encoded representations serve as a foundation for various downstream biochemical studies, including property prediction and drug design. MRL has had great success with proteins and general biomolecule datasets. Yet, in the growing sub-field of glycoscience (the study of carbohydrates, where longer carbohydrates are also called glycans), MRL methods have been barely explored. This under-exploration can be primarily attributed to the limited availability of comprehensive and well-curated carbohydrate-specific datasets and a lack of Machine learning (ML) pipelines specifically tailored to meet the unique problems presented by carbohydrate data. Since interpreting and annotating carbohydrate-specific data is generally more complicated than protein data, domain experts are usually required to get involved. The existing MRL methods, predominately optimized for proteins and small biomolecules, also cannot be directly used in carbohydrate applications without special modifications. To address this challenge, accelerate progress in glycoscience, and enrich the data resources of the MRL community, we introduce GlycoNMR. GlycoNMR contains two laboriously curated datasets with 2,609 carbohydrate structures and 211,543 annotated nuclear magnetic resonance (NMR) chemical shifts for precise atomic-level prediction. We tailored carbohydrate-specific features and adapted existing MRL models to tackle this problem effectively. For illustration, we benchmark four modified MRL models on our new datasets.
- North America > United States (0.28)
- Indian Ocean (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Materials > Chemicals (0.93)
Carbohydrate NMR chemical shift predictions using E(3) equivariant graph neural networks
Bånkestad, Maria, Dorst, Keven M., Widmalm, Göran, Rönnols, Jerk
Carbohydrates, vital components of biological systems, are well-known for their structural diversity. Nuclear Magnetic Resonance (NMR) spectroscopy plays a crucial role in understanding their intricate molecular arrangements and is essential in assessing and verifying the molecular structure of organic molecules. An important part of this process is to predict the NMR chemical shift from the molecular structure. This work introduces a novel approach that leverages E(3) equivariant graph neural networks to predict carbohydrate NMR spectra. Notably, our model achieves a substantial reduction in mean absolute error, up to threefold, compared to traditional models that rely solely on two-dimensional molecular structure. Even with limited data, the model excels, highlighting its robustness and generalization capabilities. The implications are far-reaching and go beyond an advanced understanding of carbohydrate structures and spectral interpretation. For example, it could accelerate research in pharmaceutical applications, biochemistry, and structural biology, offering a faster and more reliable analysis of molecular structures. Furthermore, our approach is a key step towards a new data-driven era in spectroscopy, potentially influencing spectroscopic techniques beyond NMR.
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > Italy > Veneto (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Sweden > Uppsala County > Uppsala (0.04)
NMR shift prediction from small data quantities
Rull, Herman, Fischer, Markus, Kuhn, Stefan
Prediction of chemical shift in NMR using machine learning methods is typically done with the maximum amount of data available to achieve the best results. In some cases, such large amounts of data are not available, e.g. for heteronuclei. We demonstrate a novel machine learning model which is able to achieve good results with comparatively low amounts of data. We show this by predicting 19F and 13C NMR chemical shifts of small molecules in specific solvents.
Predicting Chemical Shifts with Graph Neural Networks
Inferring molecular structure from NMR measurements requires an accurate forward model that can predict chemical shifts from 3D structure. Current forward models are limited to specific molecules like proteins and state of the art models are not differentiable. Thus they cannot be used with gradient methods like biased molecular dynamics. Here we use graph neural networks (GNNs) for NMR chemical shift prediction. Our GNN can model chemical shifts accurately and capture important phenomena like hydrogen bonding induced downfield shift between multiple proteins, secondary structure effects, and predict shifts of organic molecules.
NMR Assignment through Linear Programming
Bravo-Ferreira, Jose F. S., Cowburn, David, Khoo, Yuehaw, Singer, Amit
Nuclear Magnetic Resonance (NMR) Spectroscopy is the second most used technique (after X-ray crystallography) for structural determination of proteins. A computational challenge in this technique involves solving a discrete optimization problem that assigns the resonance frequency to each atom in the protein. We present a novel linear programming formulation of the problem which gives state-of-the-art results in simulated and experimental datasets.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom (0.04)