Grammars and reinforcement learning for molecule optimization
An important challenge in drug discovery is to find molecules with desired chemical properties. While ultimate usefulness as a drug can only be determined in a laboratory or clinical context, that process is expensive, and it is thus advantageous to pre-select likely candidates in software. While deep learning has been extensively investigated for molecular graph encoding ([Duvenaud et al., 2015], [Kearnes et al., 2016], [Gilmer et al., 2017]), molecule generation is still subject of active research. The simplest natural approach to candidate molecule generation is to generate some sort of a linear representation,such as a string of characters in the SMILES format [Weininger, 1988], using an encoder-decoder network architecture similar to that used in machine translation, as done in [Gómez-Bombarelli et al., 2016]. This approach's performance was comparatively poor because a molecule's structure is not linear, but rather a graph which typically includes cycles, so it falls to the model to learn how to generate SMILES strings that correspond to chemically valid molecules - a nontrivial task that leaves the model with little spare capacity to additionally optimize a given chemical metric of the molecules produced. A way to partially remedy that involves generating not the actual SMILES strings, but a sequence ofproduction rules of a context-free grammar (CFG) for SMILES, as done by [Kusner et al., 2017]. That guarantees that the SMILES strings produced are grammatically valid, putting less burden on the model to ensure validity and thereby achieving better metrics. However, [Kusner et al., 2017] give two reasons why this is still not guaranteed to produce chemically valid molecules: firstly, a grammatically valid SMILES string is not guaranteed to be chemically possible (because of atom valences being wrong, for example), and secondly, because a
Nov-27-2018