We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation. The model is trained on 50,000 experimental reaction examples from the United States patent literature, which span 10 broad reaction types that are commonly used by medicinal chemists. We find that our model performs comparably with a rule-based expert system baseline model, and also overcomes certain limitations associated with rule-based expert systems and with any machine learning approach that contains a rule-based expert system component. Our model provides an important first step towards solving the challenging problem of computational retrosynthetic analysis.
Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Previous approaches are not high-throughput, are not generalizable or scalable, or lack sufficient data to be effective. We describe single mechanistic reactions as concerted electron movements from an electron orbital source to an electron orbital sink. We use an existing rule-based expert system to derive a dataset consisting of 2,989 productive mechanistic steps and 6.14 million non-productive mechanistic steps. We then pose identifying productive mechanistic steps as a ranking problem: rank potential orbital interactions such that the top ranked interactions yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.0% of non-productive reactions with less than a 0.1% false negative rate. Then, we train an ensemble of ranking models on pairs of interacting orbitals to learn a relative productivity function over single mechanistic reactions in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanisms at the top 89.1% of the time, rising to 99.9% of the time when top ranked lists with at most four non-productive reactions are considered. The final system allows multi-step reaction prediction. Furthermore, it is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert system does not handle.
University of Cambridge researchers have shown that an algorithm can predict the outcomes of complex chemical reactions with over 90% accuracy, outperforming trained chemists. The algorithm also shows chemists how to make target compounds, providing the chemical'map' to the desired destination. The results are reported in two studies in the journals ACS Central Science and Chemical Communications. A central challenge in drug discovery and materials science is finding ways to make complicated organic molecules by chemically joining together simpler building blocks. The problem is that those building blocks often react in unexpected ways.
Researchers have designed a machine learning algorithm that predicts the outcome of chemical reactions with much higher accuracy than trained chemists and suggests ways to make complex molecules, removing a significant hurdle in drug discovery. University of Cambridge researchers have shown that an algorithm can predict the outcomes of complex chemical reactions with over 90% accuracy, outperforming trained chemists. The algorithm also shows chemists how to make target compounds, providing the chemical "map" to the desired destination. The results are reported in two studies in the journals ACS Central Science and Chemical Communications. A central challenge in drug discovery and materials science is finding ways to make complicated organic molecules by chemically joining together simpler building blocks.
The future of computing is one of the strongest transformational forces on our planet. Everything we touch has built-in computing capabilities and is generating tremendous volumes of data. The impact is not only speeding up our daily lives, but also more traditional industrial sectors, including chemistry. Last year at the ACS Fall Meeting 2018 in Boston, IBM Research released IBM RXN for Chemistry, a cloud-based app that takes the idea of relating organic chemistry to a language. The magic behind the app is a state-of-the-art neural machine translation method, which can predict the most likely outcome of a chemical reaction using sequence-to-sequence (seq2seq) models.