Jones, Penelope
The Photoswitch Dataset: A Molecular Machine Learning Benchmark for the Advancement of Synthetic Chemistry
Thawani, Aditya R., Griffiths, Ryan-Rhys, Jamasb, Arian, Bourached, Anthony, Jones, Penelope, McCorkindale, William, Aldrick, Alexander A., Lee, Alpha A.
The space of synthesizable molecules is greater than $10^{60}$, meaning only a vanishingly small fraction of these molecules have ever been realized in the lab. In order to prioritize which regions of this space to explore next, synthetic chemists need access to accurate molecular property predictions. While great advances in molecular machine learning have been made, there is a dearth of benchmarks featuring properties that are useful for the synthetic chemist. Focussing directly on the needs of the synthetic chemist, we introduce the Photoswitch Dataset, a new benchmark for molecular machine learning where improvements in model performance can be immediately observed in the throughput of promising molecules synthesized in the lab. Photoswitches are a versatile class of molecule for medical and renewable energy applications where a molecule's efficacy is governed by its electronic transition wavelengths. We demonstrate superior performance in predicting these wavelengths compared to both time-dependent density functional theory (TD-DFT), the incumbent first principles quantum mechanical approach, as well as a panel of human experts. Our baseline models are currently being deployed in the lab as part of the decision process for candidate synthesis. It is our hope that this benchmark can drive real discoveries in photoswitch chemistry and that future benchmarks can be introduced to pivot learning algorithm development to benefit more expansive areas of synthetic chemistry.
Precision Medicine as an Accelerator for Next Generation Cognitive Supercomputing
Begoli, Edmon, Brase, Jim, DeLaRosa, Bambi, Jones, Penelope, Kusnezov, Dimitri, Paragas, Jason, Stevens, Rick, Streitz, Fred, Tourassi, Georgia
In the past several years, we have taken advantage of a number of opportunities to advance the intersection of next generation high-performance computing AI and big data technologies through partnerships in precision medicine. Today we are in the throes of piecing together what is likely the most unique convergence of medical data and computer technologies. But more deeply, we observe that the traditional paradigm of computer simulation and prediction needs fundamental revision. This is the time for a number of reasons. We will review what the drivers are, why now, how this has been approached over the past several years, and where we are heading.