Ryu, Seongok
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation
Yang, Soojung, Hwang, Doyeong, Lee, Seul, Ryu, Seongok, Hwang, Sung Ju
Recently, utilizing reinforcement learning (RL) to generate molecules with desired properties has been highlighted as a promising strategy for drug design. A molecular docking program - a physical simulation that estimates protein-small molecule binding affinity - can be an ideal reward scoring function for RL, as it is a straightforward proxy of the therapeutic potential. Still, two imminent challenges exist for this task. First, the models often fail to generate chemically realistic and pharmacochemically acceptable molecules. Second, the docking score optimization is a difficult exploration problem that involves many local optima and less smooth surfaces with respect to molecular structure. To tackle these challenges, we propose a novel RL framework that generates pharmacochemically acceptable molecules with large docking scores. Our method - Fragment-based generative RL with Explorative Experience replay for Drug design (FREED) - constrains the generated molecules to a realistic and qualified chemical space and effectively explores the space to find drugs by coupling our fragment-based generation method and a novel error-prioritized experience replay (PER). We also show that our model performs well on both de novo and scaffold-based schemes. Our model produces molecules of higher quality compared to existing methods while achieving state-of-the-art performance on two of three targets in terms of the docking scores of the generated molecules. We further show with ablation studies that our method, predictive error-PER (FREED(PE)), significantly improves the model performance.
A benchmark study on reliable molecular supervised learning via Bayesian learning
Hwang, Doyeong, Lee, Grace, Jo, Hanseok, Yoon, Seyoul, Ryu, Seongok
Virtual screening aims to find desirable compounds from chemical library by using computational methods. For this purpose with machine learning, model outputs that can be interpreted as predictive probability will be beneficial, in that a high prediction score corresponds to high probability of correctness. In this work, we present a study on the prediction performance and reliability of graph neural networks trained with the recently proposed Bayesian learning algorithms. Our work shows that Bayesian learning algorithms allow well-calibrated predictions for various GNN architectures and classification tasks. Also, we show the implications of reliable predictions on virtual screening, where Bayesian learning may lead to higher success in finding hit compounds.
Predicting drug-target interaction using 3D structure-embedded graph representations from graph neural networks
Lim, Jaechang, Ryu, Seongok, Park, Kyubyong, Choe, Yo Joong, Ham, Jiyeon, Kim, Woo Youn
Accurate prediction of drug-target interaction (DTI) is essential for in silico drug design. For the purpose, we propose a novel approach for predicting DTI using a GNN that directly incorporates the 3D structure of a protein-ligand complex. We also apply a distance-aware graph attention algorithm with gate augmentation to increase the performance of our model. As a result, our model shows better performance than docking and other deep learning methods for both virtual screening and pose prediction. In addition, our model can reproduce the natural population distribution of active molecules and inactive molecules.
Uncertainty quantification of molecular property prediction with Bayesian neural networks
Ryu, Seongok, Kwon, Yongchan, Kim, Woo Youn
Deep neural networks have outperformed existing machine learning models in various molecular applications. In practical applications, it is still difficult to make confident decisions because of the uncertainty in predictions arisen from insufficient quality and quantity of training data. Here, we show that Bayesian neural networks are useful to quantify the uncertainty of molecular property prediction with three numerical experiments. In particular, it enables us to decompose the predictive variance into the model- and data-driven uncertainties, which helps to elucidate the source of errors. In the logP predictions, we show that data noise affected the data-driven uncertainties more significantly than the model-driven ones. Based on this analysis, we were able to find unexpected errors in the Harvard Clean Energy Project dataset. Lastly, we show that the confidence of prediction is closely related to the predictive uncertainty by performing on bio-activity and toxicity classification problems.
Uncertainty quantification of molecular property prediction using Bayesian neural network models
Ryu, Seongok, Kwon, Yongchan, Kim, Woo Youn
In chemistry, deep neural network models have been increasingly utilized in a variety of applications such as molecular property predictions, novel molecule designs, and planning chemical reactions. Despite the rapid increase in the use of state-of-the-art models and algorithms, deep neural network models often produce poor predictions in real applications because model performance is highly dependent on the quality of training data. In the field of molecular analysis, data are mostly obtained from either complicated chemical experiments or approximate mathematical equations, and then quality of data may be questioned. In this paper, we quantify uncertainties of prediction using Bayesian neural networks in molecular property predictions. We estimate both model-driven and data-driven uncertainties, demonstrating the usefulness of uncertainty quantification as both a quality checker and a confidence indicator with the three experiments. Our results manifest that uncertainty quantification is necessary for more reliable molecular applications and Bayesian neural network models can be a practical approach.
Molecular generative model based on conditional variational autoencoder for de novo molecular design
Lim, Jaechang, Ryu, Seongok, Kim, Jin Woo, Kim, Woo Youn
We propose a molecular generative model based on the conditional variational autoencoder for de novo molecular design. It is specialized to control multiple molecular properties simultaneously by imposing them on a latent space. As a proof of concept, we demonstrate that it can be used to generate drug-like molecules with five target properties. We were also able to adjust a single property without changing the others and to manipulate it beyond the range of the dataset.
Deeply learning molecular structure-property relationships using graph attention neural network
Ryu, Seongok, Lim, Jaechang, Kim, Woo Youn
Molecular structure-property relationships are the key to molecular engineering for materials and drug discovery. The rise of deep learning offers a new viable solution to elucidate the structure-property relationships directly from chemical data. Here we show that graph attention networks can greatly improve performance of the deep learning for chemistry. The attention mechanism enables to distinguish atoms in different environments and thus to extract important structural features determining target properties. We demonstrated that our model can detect appropriate features for molecular polarity, solubility, and energy. Interestingly, it identified two distinct parts of molecules as essential structural features for high photovoltaic efficiency, each of which coincided with the area of donor and acceptor orbitals in charge-transfer excitations, respectively. As a result, it could accurately predict molecular properties. Moreover, the resultant latent space was well-organized such that molecules with similar properties were closely located, which is critical for successful molecular engineering.