Many of today's most urgent problems demand new molecules and materials, from antimicrobial drugs to fight superbugs and antivirals to treat novel pandemics to more sustainable photosensitive coatings for semiconductors and next-generation polymers to capture carbon dioxide right at its source. We can design these from scratch, using AI to expedite the otherwise expensive and slow process, or we can tweak existing molecules to fine-tune the properties we care about -- such as toxicity, activity, or stability. Starting from a known molecule is like getting a head start on the design and production of candidate molecules, as we know they have some of the characteristics we need, and we can use existing knowledge and manufacturing pipelines to synthesize and test them down the line. The challenge in this process, called molecular optimization, is that tweaking an existing molecule can produce a huge number of variants. They won't all have the desired properties, and evaluating them empirically to find those that do would take too much time and money to be feasible.
The recent COVID-19 pandemic has highlighted the need for rapid therapeutic development for infectious diseases. To accelerate this process, we present a deep learning based generative modeling framework, CogMol, to design drug candidates specific to a given target protein sequence with high off-target selectivity. We augment this generative framework with an in silico screening process that accounts for toxicity, to lower the failure rate of the generated drug candidates in later stages of the drug development pipeline. We apply this framework to three relevant proteins of the SARS-CoV-2, the virus responsible for COVID-19, namely non-structural protein 9 (NSP9) replicase, main protease, and the receptor-binding domain (RBD) of the S protein. Docking to the target proteins demonstrate the potential of these generated molecules as ligands. Structural similarity analyses further imply novelty of the generated molecules with respect to the training dataset as well as possible biological association of a number of generated molecules that might be of relevance to COVID-19 therapeutic design. While the validation of these molecules is underway, we release ~ 3000 novel COVID-19 drug candidates generated using our framework. URL : http://ibm.biz/covid19-mol
Variational autoencoders (VAEs) defined over SMILES string and graph-based representations of molecules promise to improve the optimization of molecular properties, thereby revolutionizing the pharmaceuticals and materials industries. However, these VAEs are hindered by the non-unique nature of SMILES strings and the computational cost of graph convolutions. To efficiently pass messages along all paths through the molecular graph, we encode multiple SMILES strings of a single molecule using a set of stacked recurrent neural networks, pooling hidden representations of each atom between SMILES representations, and use attentional pooling to build a final fixed-length latent representation. By then decoding to a disjoint set of SMILES strings of the molecule, our All SMILES VAE learns an almost bijective mapping between molecules and latent representations near the high-probability-mass subspace of the prior. Our SMILES-derived but molecule-based latent representations significantly surpass the state-of-the-art in a variety of fully- and semi-supervised property regression and molecular property optimization tasks.
As a promising tool to navigate in the vast chemical space, artificial intelligence (AI) is leveraged for drug design. From the year 2017 to 2021, the number of applications of several recent AI models (i.e. graph neural network (GNN), recurrent neural network (RNN), variation autoencoder (VAE), generative adversarial network (GAN), flow and reinforcement learning (RL)) in drug design increases significantly. Many relevant literature reviews exist. However, none of them provides an in-depth summary of many applications of the recent AI models in drug design. To complement the existing literature, this survey includes the theoretical development of the previously mentioned AI models and detailed summaries of 42 recent applications of AI in drug design. Concretely, 13 of them leverage GNN for molecular property prediction and 29 of them use RL and/or deep generative models for molecule generation and optimization. In most cases, the focus of the summary is the models, their variants, and modifications for specific tasks in drug design. Moreover, 60 additional applications of AI in molecule generation and optimization are briefly summarized in a table. Finally, this survey provides a holistic discussion of the abundant applications so that the tasks, potential solutions, and challenges in AI-based drug design become evident.
Efficiently discovering molecules that meet various property requirements can significantly benefit the drug discovery industry. Since it is infeasible to search over the entire chemical space, recent works adopt generative models for goal-directed molecular generation. They tend to utilize the iterative processes, optimizing the parameters of the molecular generative models at each iteration to produce promising molecules for further validation. Assessments are exploited to evaluate the generated molecules at each iteration, providing direction for model optimization. However, most previous works require a massive number of expensive and time-consuming assessments, e.g., wet experiments and molecular dynamic simulations, leading to the lack of practicability. To reduce the assessments in the iterative process, we propose a cost-effective evolution strategy in latent space, which optimizes the molecular latent representation vectors instead. We adopt a pre-trained molecular generative model to map the latent and observation spaces, taking advantage of the large-scale unlabeled molecules to learn chemical knowledge. To further reduce the number of expensive assessments, we introduce a pre-screener as the proxy to the assessments. We conduct extensive experiments on multiple optimization tasks comparing the proposed framework to several advanced techniques, showing that the proposed framework achieves better performance with fewer assessments.