Benchmarking Pretrained Molecular Embedding Models For Molecular Representation Learning

Praski, Mateusz, Adamczyk, Jakub, Czech, Wojciech

Sep-16-2025–arXiv.org Artificial Intelligence

Pretrained neural networks have attracted significant interest in chemistry and small molecule drug design. Embed-dings from these models are widely used for molecular property prediction, virtual screening, and small data learning in molecular chemistry. This study presents the most extensive comparison of such models to date, evaluating 25 models across 25 datasets. Under a fair comparison framework, we assess models spanning various modalities, architectures, and pretraining strategies. Using a dedicated hierarchical Bayesian statistical testing model, we arrive at a surprising result: nearly all neural models show negligible or no improvement over the baseline ECFP molecular fingerprint. Only the CLAMP model, which is also based on molecular fingerprints, performs statistically significantly better than the alternatives. These findings raise concerns about the evaluation rigor in existing studies. We discuss potential causes, propose solutions, and offer practical recommendations.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Sep-16-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Poland (0.28)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Representation & Reasoning (0.93)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found