Objective: The objective of this paper is to highlight the state-of-the-art machine learning (ML) techniques in computational docking. The use of smart computational methods in the life cycle of drug design is relatively a recent development that has gained much popularity and interest over the last few years. Central to this methodology is the notion of computational docking which is the process of predicting the best pose (orientation conformation) of a small molecule (drug candidate) when bound to a target larger receptor molecule (protein) in order to form a stable complex molecule. In computational docking, a large number of binding poses are evaluated and ranked using a scoring function. The scoring function is a mathematical predictive model that produces a score that represents the binding free energy, and hence the stability, of the resulting complex molecule.
Many of today's most urgent problems demand new molecules and materials, from antimicrobial drugs to fight superbugs and antivirals to treat novel pandemics to more sustainable photosensitive coatings for semiconductors and next-generation polymers to capture carbon dioxide right at its source. We can design these from scratch, using AI to expedite the otherwise expensive and slow process, or we can tweak existing molecules to fine-tune the properties we care about -- such as toxicity, activity, or stability. Starting from a known molecule is like getting a head start on the design and production of candidate molecules, as we know they have some of the characteristics we need, and we can use existing knowledge and manufacturing pipelines to synthesize and test them down the line. The challenge in this process, called molecular optimization, is that tweaking an existing molecule can produce a huge number of variants. They won't all have the desired properties, and evaluating them empirically to find those that do would take too much time and money to be feasible.
While AI can lift competition and productivity, it also can act as a great leveler, putting smaller players on the same footing as goliaths. Take pharmaceutical research, for example. Large companies have the budget and resources to physically test millions of drug candidates, giving them an advantage over startups and researchers. But smaller labs can achieve similar results by harnessing neural networks that simulate how a potential drug molecule will bind with a target protein. Deep learning can help smaller companies and other researchers discover promising drug treatments by improving the speed and accuracy of molecular docking, the process of computationally predicting how and how well a molecule binds with a protein.
The accurate prediction of molecular energetics in chemical compound space is a crucial ingredient for rational compound design. The inherently graph-like, non-vectorial nature of molecular data gives rise to a unique and difficult machine learning problem. In this paper, we adopt a learning-from-scratch approach where quantum-mechanical molecular energies are predicted directly from the raw molecular geometry. The study suggests a benefit from setting flexible priors and enforcing invariance stochastically rather than structurally. Our results improve the state-of-the-art by a factor of almost three, bringing statistical methods one step closer to the holy grail of ''chemical accuracy''.
Predicting the interaction between a compound and a target is crucial for rapid drug repurposing. Deep learning has been successfully applied in drug-target affinity (DTA) problem. However, previous deep learning-based methods ignore modeling the direct interactions between drug and protein residues. This would lead to inaccurate learning of target representation which may change due to the drug binding effects. In addition, previous DTA methods learn protein representation solely based on a small number of protein sequences in DTA datasets while neglecting the use of proteins outside of the DTA datasets. We propose GEFA (Graph Early Fusion Affinity), a novel graph-in-graph neural network with attention mechanism to address the changes in target representation because of the binding effects. Specifically, a drug is modeled as a graph of atoms, which then serves as a node in a larger graph of residues-drug complex. The resulting model is an expressive deep nested graph neural network. We also use pre-trained protein representation powered by the recent effort of learning contextualized protein representation. The experiments are conducted under different settings to evaluate scenarios such as novel drugs or targets. The results demonstrate the effectiveness of the pre-trained protein embedding and the advantages our GEFA in modeling the nested graph for drug-target interaction.