Benchmarking Simulacra AI's Quantum Accurate Synthetic Data Generation for Chemical Sciences

Falcioni, Fabio, Orlova, Elena, Heightman, Timothy, Mantrov, Philip, Ustimenko, Aleksei

Nov-12-2025–arXiv.org Artificial Intelligence

In this work, we benchmark \simulacra's synthetic data generation pipeline against a state-of-the-art Microsoft pipeline on a dataset of small to large systems. By analyzing the energy quality, autocorrelation times, and effective sample size, our findings show that Simulacra's Large Wavefunction Models (LWM) pipeline, paired with state-of-the-art Variational Monte Carlo (VMC) sampling algorithms, reduces data generation costs by 15-50x, while maintaining parity in energy accuracy, and 2-3x compared to traditional CCSD methods on the scale of amino acids. This enables the creation of affordable, large-scale \textit{ab-initio} datasets, accelerating AI-driven optimization and discovery in the pharmaceutical industry and beyond. Our improvements are based on a novel and proprietary sampling scheme called Replica Exchange with Langevin Adaptive eXploration (RELAX).

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Nov-12-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)
- Europe > United Kingdom
  - England (0.28)

Genre:
- Research Report > New Finding (0.86)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.68)
  - Machine Learning
    - Neural Networks (0.95)
    - Statistical Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found