phy
Enhanced Diffusion Sampling: Efficient Rare Event Sampling and Free Energy Calculation with Diffusion Models
Xie, Yu, Winkler, Ludwig, Sun, Lixin, Lewis, Sarah, Foster, Adam E., Luna, José Jiménez, Hempel, Tim, Gastegger, Michael, Chen, Yaoyi, Zaporozhets, Iryna, Clementi, Cecilia, Bishop, Christopher M., Noé, Frank
The rare-event sampling problem has long been the central limiting factor in molecular dynamics (MD), especially in biomolecular simulation. Recently, diffusion models such as BioEmu have emerged as powerful equilibrium samplers that generate independent samples from complex molecular distributions, eliminating the cost of sampling rare transition events. However, a sampling problem remains when computing observables that rely on states which are rare in equilibrium, for example folding free energies. Here, we introduce enhanced diffusion sampling, enabling efficient exploration of rare-event regions while preserving unbiased thermodynamic estimators. The key idea is to perform quantitatively accurate steering protocols to generate biased ensembles and subsequently recover equilibrium statistics via exact reweighting. We instantiate our framework in three algorithms: UmbrellaDiff (umbrella sampling with diffusion models), $Δ$G-Diff (free-energy differences via tilted ensembles), and MetaDiff (a batchwise analogue for metadynamics). Across toy systems, protein folding landscapes and folding free energies, our methods achieve fast, accurate, and scalable estimation of equilibrium properties within GPU-minutes to hours per system -- closing the rare-event sampling gap that remained after the advent of diffusion-model equilibrium samplers.
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Energy (0.68)
- Asia > China > Beijing > Beijing (0.05)
- North America > United States (0.04)
- Asia > China > Guangxi Province > Nanning (0.04)
- Energy (0.47)
- Health & Medicine (0.46)
Mean-field theory of graph neural networks in graph partitioning
Tatsuro Kawamoto, Masashi Tsubaki, Tomoyuki Obuchi
A theoretical performance analysis of the graph neural network (GNN) is presented. For classification tasks, the neural network approach has the advantage in terms of flexibility that it can be employed in a data-driven manner, whereas Bayesian inference requires the assumption of a specific model. A fundamental question is then whether GNN has a high accuracy in addition to this flexibility.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- South America > Brazil > São Paulo (0.05)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
- (4 more...)
Cutting Through the Noise: On-the-fly Outlier Detection for Robust Training of Machine Learning Interatomic Potentials
Lam, Terry C. W., O'Neill, Niamh, Schran, Christoph, Schaaf, Lars L.
The accuracy of machine learning interatomic potentials suffers from reference data that contains numerical noise. Often originating from unconverged or inconsistent electronic-structure calculations, this noise is challenging to identify. Existing mitigation strategies such as manual filtering or iterative refinement of outliers, require either substantial expert effort or multiple expensive retraining cycles, making them difficult to scale to large datasets. Here, we introduce an on-the-fly outlier detection scheme that automatically down-weights noisy samples, without requiring additional reference calculations. By tracking the loss distribution via an exponential moving average, this unsupervised method identifies outliers throughout a single training run. We show that this approach prevents overfitting and matches the performance of iterative refinement baselines with significantly reduced overhead. The method's effectiveness is demonstrated by recovering accurate physical observables for liquid water from unconverged reference data, including diffusion coefficients. Furthermore, we validate its scalability by training a foundation model for organic chemistry on the SPICE dataset, where it reduces energy errors by a factor of three. This framework provides a simple, automated solution for training robust models on imperfect datasets across dataset sizes.
- North America > United States (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Africa > Comoros > Grande Comore > Moroni (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Information Technology > Hardware (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (0.67)