Gx2Mol: De Novo Generation of Hit-like Molecules from Gene Expression Profiles via Deep Learning

Li, Chen, Matsukiyo, Yuki, Yamanishi, Yoshihiro

arXiv.org Artificial Intelligence 

Abstract-- De novo generation of hit-like molecules is a challenging task in the drug discovery process. Most methods in previous studies learn the semantics and syntax of molecular structures by analyzing molecular graphs or simplified molecular input line entry system (SMILES) strings; however, they do not take into account the drug responses of the biological systems consisting of genes and proteins. In this study we propose a deep generative model, Gx2Mol, which utilizes gene expression profiles to generate molecular structures with desirable phenotypes for arbitrary target proteins. In the algorithm, a variational autoencoder is employed as a feature extractor to learn the latent feature distribution of the gene expression profiles. Then, a long short-term memory is leveraged as the chemical generator to produce syntactically valid SMILES strings that satisfy the feature conditions of the gene expression profile extracted by the feature extractor. Experimental results and case studies demonstrate that the proposed Gx2Mol model can produce new molecules with potential bioactivities and drug-like properties. However, most methods Exploring the chemical space to discover molecules with in the previous studies focused on learning the syntax therapeutic effects (e.g., anticancer drug production) is a and semantics of molecular structures by analyzing molecular time-consuming, costly, and high-risk task in the drug graphs or simplified molecular input line entry system discovery field.