Broadening Discovery through Structural Models: Multimodal Combination of Local and Structural Properties for Predicting Chemical Features
Rekut, Nikolai, Orlov, Alexey, Ziu, Klea, Starykh, Elizaveta, Takac, Martin, Beznosikov, Aleksandr
–arXiv.org Artificial Intelligence
In recent years, machine learning has profoundly reshaped the field of chemistry, facilitating significant advancements across various applications, including the prediction of molecular properties and the generation of molecular structures. Language models and graph-based models are extensively utilized within this domain, consistently achieving state-of-the-art results across an array of tasks. However, the prevailing practice of representing chemical compounds in the SMILES format -- used by most datasets and many language models -- presents notable limitations as a training data format. In contrast, chemical fingerprints offer a more physically informed representation of compounds, thereby enhancing their suitability for model training. This study aims to develop a language model that is specifically trained on fingerprints. Furthermore, we introduce a bimodal architecture that integrates this language model with a graph model. Our proposed methodology synthesizes these approaches, utilizing RoBERTa as the language model and employing Graph Isomorphism Networks (GIN), Graph Convolutional Networks (GCN) and Graphormer as graph models. This integration results in a significant improvement in predictive performance compared to conventional strategies for tasks such as Quantitative Structure-Activity Relationship (QSAR) and the prediction of nuclear magnetic resonance (NMR) spectra, among others.
arXiv.org Artificial Intelligence
Feb-25-2025
- Country:
- Asia > Middle East
- UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States (1.00)
- Asia > Middle East
- Genre:
- Research Report
- Experimental Study (0.46)
- New Finding (0.68)
- Research Report
- Industry:
- Government > Regional Government
- North America Government > United States Government > FDA (0.68)
- Health & Medicine
- Government Relations & Public Policy (0.68)
- Pharmaceuticals & Biotechnology (1.00)
- Public Health (0.68)
- Therapeutic Area
- Immunology (1.00)
- Infections and Infectious Diseases (0.70)
- Neurology (0.68)
- Materials > Chemicals (0.89)
- Government > Regional Government
- Technology: