Not enough data to create a plot.
Try a different view from the menu above.
Tom, Gary
From Molecules to Mixtures: Learning Representations of Olfactory Mixture Similarity using Inductive Biases
Tom, Gary, Ser, Cher Tian, Rajaonson, Ella M., Lo, Stanley, Park, Hyun Suk, Lee, Brian K., Sanchez-Lengeling, Benjamin
Olfaction -- how molecules are perceived as odors to humans -- remains poorly understood. Recently, the principal odor map (POM) was introduced to digitize the olfactory properties of single compounds. However, smells in real life are not pure single molecules, but complex mixtures of molecules, whose representations remain relatively under-explored. In this work, we introduce POMMix, an extension of the POM to represent mixtures. Our representation builds upon the symmetries of the problem space in a hierarchical manner: (1) graph neural networks for building molecular embeddings, (2) attention mechanisms for aggregating molecular representations into mixture representations, and (3) cosine prediction heads to encode olfactory perceptual distance in the mixture embedding space. POMMix achieves state-of-the-art predictive performance across multiple datasets. We also evaluate the generalizability of the representation on multiple splits when applied to unseen molecules and mixture sizes. Our work advances the effort to digitize olfaction, and highlights the synergy of domain expertise and deep learning in crafting expressive representations in low-data regimes.
Ranking over Regression for Bayesian Optimization and Molecule Selection
Tom, Gary, Lo, Stanley, Corapi, Samantha, Aspuru-Guzik, Alan, Sanchez-Lengeling, Benjamin
Bayesian optimization (BO) has become an indispensable tool for autonomous decision-making across diverse applications from autonomous vehicle control to accelerated drug and materials discovery. With the growing interest in self-driving laboratories, BO of chemical systems is crucial for machine learning (ML) guided experimental planning. Typically, BO employs a regression surrogate model to predict the distribution of unseen parts of the search space. However, for the selection of molecules, picking the top candidates with respect to a distribution, the relative ordering of their properties may be more important than their exact values. In this paper, we introduce Rank-based Bayesian Optimization (RBO), which utilizes a ranking model as the surrogate. We present a comprehensive investigation of RBO's optimization performance compared to conventional BO on various chemical datasets. Our results demonstrate similar or improved optimization performance using ranking models, particularly for datasets with rough structure-property landscapes and activity cliffs. Furthermore, we observe a high correlation between the surrogate ranking ability and BO performance, and this ability is maintained even at early iterations of BO optimization when using ranking surrogate models. We conclude that RBO is an effective alternative to regression-based BO, especially for optimizing novel chemical compounds.
GAUCHE: A Library for Gaussian Processes in Chemistry
Griffiths, Ryan-Rhys, Klarner, Leo, Moss, Henry B., Ravuri, Aditya, Truong, Sang, Stanton, Samuel, Tom, Gary, Rankovic, Bojana, Du, Yuanqi, Jamasb, Arian, Deshwal, Aryan, Schwartz, Julius, Tripp, Austin, Kell, Gregory, Frieder, Simon, Bourached, Anthony, Chan, Alex, Moss, Jacob, Guo, Chengzhi, Durholt, Johannes, Chaurasia, Saudamini, Strieth-Kalthoff, Felix, Lee, Alpha A., Cheng, Bingqing, Aspuru-Guzik, Alรกn, Schwaller, Philippe, Tang, Jian
We introduce GAUCHE, a library for GAUssian processes in CHEmistry. Gaussian processes have long been a cornerstone of probabilistic machine learning, affording particular advantages for uncertainty quantification and Bayesian optimisation. Extending Gaussian processes to chemical representations, however, is nontrivial, necessitating kernels defined over structured inputs such as graphs, strings and bit vectors. By defining such kernels in GAUCHE, we seek to open the door to powerful tools for uncertainty quantification and Bayesian optimisation in chemistry. Motivated by scenarios frequently encountered in experimental chemistry, we showcase applications for GAUCHE in molecular discovery and chemical reaction optimisation. The codebase is made available at https://github.com/leojklarner/gauche
Calibration and generalizability of probabilistic models on low-data chemical datasets with DIONYSUS
Tom, Gary, Hickman, Riley J., Zinzuwadia, Aniket, Mohajeri, Afshan, Sanchez-Lengeling, Benjamin, Aspuru-Guzik, Alan
Deep learning models that leverage large datasets are often the state of the art for modelling molecular properties. When the datasets are smaller (< 2000 molecules), it is not clear that deep learning approaches are the right modelling tool. In this work we perform an extensive study of the calibration and generalizability of probabilistic machine learning models on small chemical datasets. Using different molecular representations and models, we analyse the quality of their predictions and uncertainties in a variety of tasks (binary, regression) and datasets. We also introduce two simulated experiments that evaluate their performance: (1) Bayesian optimization guided molecular design, (2) inference on out-of-distribution data via ablated cluster splits. We offer practical insights into model and feature choice for modelling small chemical datasets, a common scenario in new chemical experiments. We have packaged our analysis into the DIONYSUS repository, which is open sourced to aid in reproducibility and extension to new datasets.