SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery

Honda, Shion, Shi, Shoi, Ueda, Hiroki R.

Nov-12-2019–arXiv.org Machine Learning

SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery Shion Honda 1,2,3, Shoi Shi 1,2,3, Hiroki R. Ueda 1,2,3 1 University of Tokyo 2 International Research Center for Neurointelligence 3 RIKEN Center for Biosystems Dynamics Research shion honda@ipc.i.u-tokyo.ac.jp, { sshoi0322-tky,uedah-tky}@umin.ac.jp Abstract In drug-discovery-related tasks such as virtual screening, machine learning is emerging as a promising way to predict molecular properties. Conventionally, molecular fingerprints (numerical representations of molecules) are calculated through rule-based algorithms that map molecules to a sparse discrete space. However, these algorithms perform poorly for shallow prediction models or small datasets. To address this issue, we present SMILES Transformer. Inspired by Transformer and pre-trained language models from natural language processing, SMILES Transformer learns molecular fingerprints through unsupervised pre-training of the sequence-to-sequence language model using a huge corpus of SMILES, a text representation system for molecules. We performed benchmarks on 10 datasets against existing fingerprints and graph-based methods and demonstrated the superiority of the proposed algorithms in small-data settings where pre-training facilitated good generalization.

dataset, fingerprint, representation, (14 more...)

arXiv.org Machine Learning

Nov-12-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.44)

Genre:
- Research Report > New Finding (0.47)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found