Pre-training via Denoising for Molecular Property Prediction

Zaidi, Sheheryar, Schaarschmidt, Michael, Martens, James, Kim, Hyunjik, Teh, Yee Whye, Sanchez-Gonzalez, Alvaro, Battaglia, Peter, Pascanu, Razvan, Godwin, Jonathan

arXiv.org Artificial Intelligence 

Work done during an internship at DeepMind. Many important problems involving molecular property prediction from 3D structures have limited data, posing a generalization challenge for neural networks. In this paper, we describe a pre-training technique based on denoising that achieves a new state-of-the-art in molecular property prediction by utilizing large datasets of 3D molecular structures at equilibrium to learn meaningful representations for downstream tasks. Relying on the well-known link between denoising autoencoders and score-matching, we show that the denoising objective corresponds to learning a molecular force field - arising from approximating the Boltzmann distribution with a mixture of Gaussians - directly from equilibrium structures. Our experiments demonstrate that using this pre-training objective significantly improves performance on multiple benchmarks, achieving a new state-of-the-art on the majority of targets in the widely used QM9 dataset. Our analysis then provides practical insights into the effects of different factors - dataset sizes, model size and architecture, and the choice of upstream and downstream datasets - on pre-training. The success of the best performing neural networks in vision and natural language processing (NLP) relies on pre-training the models on large datasets to learn meaningful features for downstream tasks (Dai & Le, 2015; Simonyan & Zisserman, 2014; Devlin et al., 2018; Brown et al., 2020; Dosovitskiy et al., 2020). For example, none of the best models on the widely used QM9 benchmark use any form of pre-training (e.g. Effective methods for pre-training could have a significant impact on fields such as drug discovery and material science. In this work, we focus on the problem of how large datasets of 3D molecular structures can be utilized to improve performance on downstream molecular property prediction tasks that also rely on 3D structures as input. Our answer is a form of self-supervised pre-training that generates useful representations for downstream prediction tasks, leading to state-of-the-art (SOTA) results. Inspired by recent advances in noise regularization for graph neural networks (GNNs) (Godwin et al., 2022), our pre-training objective is based on denoising in the space of structures (and is hence self-supervised). Unlike existing pre-training methods, which largely focus on 2D graphs, our approach targets the setting where the downstream task involves 3D point clouds defining the molecular structure.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found