Bond-Centered Molecular Fingerprint Derivatives: A BBBP Dataset Study
–arXiv.org Artificial Intelligence
A strong and fast baseline in molecular property prediction is a Random Forest (RF) trained on ECFP4/ECFP6 descriptors. In practice, the count-based variant of ECFP generally outperforms the binary variant, especially for classification. Recent deep-learning approaches can match or exceed these baselines, including pretrained transformer-CNN models (5) and graph neural networks such as ChemProp or AttentiveFP(6). Chemprop's key architectural choice is directed, bond-centered message passing, in contrast to the more common atom-centered formulations used by many MPNNs. Because much of the remaining architecture is comparable across message-passing GNNs, this raises a focused question: what concrete advantage does the bond-centered formulation confer over atom-centered approaches? To isolate this representational factor, we introduce a static Bond-Centered Fingerprint (BCFP) that mirrors Chemprop's bond-centric view, and we compare it directly against ECFP using a lightweight Random Forest or XGBoost pipeline on the Blood-Brain Barrier Penetration (BBBP) classification task. To our knowledge, this is the first study to propose BCFP and analyze its complementarity with ECFP (7) . Our results indicate that concatenating atom-and bond-centered fingerprints yields efficient and effective models for BBBP prediction, clarifying why bond-centric message passing often appears among top-k performers while offering a simple, fast alternative to full neural architectures.
arXiv.org Artificial Intelligence
Oct-7-2025
- Country:
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Health & Medicine (0.36)
- Technology: