Collaborating Authors

Rise of the smartish machines


CAS, a division of the American Chemical Society, which publishes C&EN, is also getting into the game. Last year, CAS licensed ChemPlanner, a retrosynthesis engine developed by the scientific publisher John Wiley & Sons. Later this year, CAS will introduce a version that incorporates a trove of new data, including its collection of human-curated reactions. According to Matthew J. Toussant, CAS's senior vice president of product and content operations, CAS's SciFindern will add 90 million reactions to the 2 million now available in ChemPlanner.

IBM Brings AI Retrosynthetic Analysis to the Cloud IBM Research Blog


The future of computing is one of the strongest transformational forces on our planet. Everything we touch has built-in computing capabilities and is generating tremendous volumes of data. The impact is not only speeding up our daily lives, but also more traditional industrial sectors, including chemistry. Last year at the ACS Fall Meeting 2018 in Boston, IBM Research released IBM RXN for Chemistry, a cloud-based app that takes the idea of relating organic chemistry to a language. The magic behind the app is a state-of-the-art neural machine translation method, which can predict the most likely outcome of a chemical reaction using sequence-to-sequence (seq2seq) models.

AI for Chemistry - ChemIntelligence


Artificial Intelligence (AI) is being used more and more by chemists to perform various tasks. Originally, research in AI applied to chemistry has largely been fueled by the need to accelerate drug discovery and reduce its huge costs and the time to market for new drugs. So far, AI has made significant progess towards the acceleration of drug discovery R&D. However, the applications of AI in chemistry are not limited to drug discovery, as discussed in a recent review. In this article, we will provide a general picture of how AI can help chemists be faster and more creative in their research.

Predicting retrosynthetic pathways using a combined linguistic model and hyper-graph exploration strategy Machine Learning

We present an extension of our Molecular Transformer architecture combined with a hyper-graph exploration strategy for automatic retrosynthesis route planning without human intervention. The single-step retrosynthetic model sets a new state of the art for predicting reactants as well as reagents, solvents and catalysts for each retrosynthetic step. We introduce new metrics (coverage, class diversity, round-trip accuracy and Jensen-Shannon divergence) to evaluate the single-step retrosynthetic models, using the forward prediction and a reaction classification model always based on the transformer architecture. The hypergraph is constructed on the fly, and the nodes are filtered and further expanded based on a Bayesian-like probability. We critically assessed the end-to-end framework with several retrosynthesis examples from literature and academic exams. Overall, the frameworks has a very good performance with few weaknesses due to the bias induced during the training process. The use of the newly introduced metrics opens up the possibility to optimize entire retrosynthetic frameworks through focusing on the performance of the single-step model only.

Learning to Make Generalizable and Diverse Predictions for Retrosynthesis Machine Learning

We propose a new model for making generalizable and diverse retrosynthetic reaction predictions. Given a target compound, the task is to predict the likely chemical reactants to produce the target. This generative task can be framed as a sequence-to-sequence problem by using the SMILES representations of the molecules. Building on top of the popular Transformer architecture, we propose two novel pre-training methods that construct relevant auxiliary tasks (plausible reactions) for our problem. Furthermore, we incorporate a discrete latent variable model into the architecture to encourage the model to produce a diverse set of alternative predictions. On the 50k subset of reaction examples from the United States patent literature (USPTO-50k) benchmark dataset, our model greatly improves performance over the baseline, while also generating predictions that are more diverse.