Grammars and reinforcement learning for molecule optimization

Nov-27-2018–arXiv.org Machine Learning

An important challenge in drug discovery is to find molecules with desired chemical properties. While ultimate usefulness as a drug can only be determined in a laboratory or clinical context, that process is expensive, and it is thus advantageous to pre-select likely candidates in software. While deep learning has been extensively investigated for molecular graph encoding ([Duvenaud et al., 2015], [Kearnes et al., 2016], [Gilmer et al., 2017]), molecule generation is still subject of active research. The simplest natural approach to candidate molecule generation is to generate some sort of a linear representation,such as a string of characters in the SMILES format [Weininger, 1988], using an encoder-decoder network architecture similar to that used in machine translation, as done in [Gómez-Bombarelli et al., 2016]. This approach's performance was comparatively poor because a molecule's structure is not linear, but rather a graph which typically includes cycles, so it falls to the model to learn how to generate SMILES strings that correspond to chemically valid molecules - a nontrivial task that leaves the model with little spare capacity to additionally optimize a given chemical metric of the molecules produced. A way to partially remedy that involves generating not the actual SMILES strings, but a sequence ofproduction rules of a context-free grammar (CFG) for SMILES, as done by [Kusner et al., 2017]. That guarantees that the SMILES strings produced are grammatically valid, putting less burden on the model to ensure validity and thereby achieving better metrics. However, [Kusner et al., 2017] give two reasons why this is still not guaranteed to produce chemically valid molecules: firstly, a grammatically valid SMILES string is not guaranteed to be chemically possible (because of atom valences being wrong, for example), and secondly, because a

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

Nov-27-2018

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.98)
  - Machine Learning
    - Reinforcement Learning (0.65)
    - Neural Networks > Deep Learning (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found