CaliciBoost: Performance-Driven Evaluation of Molecular Representations for Caco-2 Permeability Prediction
Van Le, Huong, Ren, Weibin, Kim, Junhong, Yun, Yukyung, Park, Young Bin, Kim, Young Jun, Han, Bok Kyung, Choi, Inho, Park, Jong IL, Yun, Hwi-Yeol, Choi, Jae-Mun
–arXiv.org Artificial Intelligence
ABSTRACT Caco-2 permeability serves as a critical in vitro indicator to predict oral absorption of drug candidates during early-stage drug discovery. To improve the precision and efficiency of computational predictions, we systematically investigated the impact of eight types of molecular feature representation including 2D / 3D descriptors, structural fingerprints and deep learning-based embeddings combined with automated machine learning techniques to predict Caco-2 permeability. Using two datasets of differing scale and diversity (TDC benchmark and curated OCHEM data), we assessed model performance across representations and identified PaDEL, Mordred, and RDKit descriptors as particularly effective for Caco-2 prediction. Notably, the AutoML-based model CaliciBoost achieved the best MAE performance. Furthermore, for both PaDEL and Mordred representations, the incorporation of 3D descriptors resulted in a 15.73% reduction in MAE compared to using 2D features alone, as confirmed by feature importance analysis. These findings highlight the effectiveness of AutoML approaches in ADMET modeling and offer practical guidance for feature selection in data-limited prediction tasks. INTRODUCTION Caco-2 cell permeability is a widely used in vitro proxy for assessing the intestinal absorption of drug candidates in early-stage drug discovery.
arXiv.org Artificial Intelligence
Jun-11-2025
- Country:
- Asia > South Korea > Gyeongsangbuk-do (0.04)
- Genre:
- Research Report > New Finding (0.69)
- Industry:
- Technology: