Zhang, Lecheng
FlexMol: A Flexible Toolkit for Benchmarking Molecular Relational Learning
Liu, Sizhe, Xia, Jun, Zhang, Lecheng, Liu, Yuchen, Liu, Yue, Du, Wenjie, Gao, Zhangyang, Hu, Bozhen, Tan, Cheng, Xiang, Hongxin, Li, Stan Z.
Molecular relational learning (MRL) is crucial for understanding the interaction behaviors between molecular pairs, a critical aspect of drug discovery and development. However, the large feasible model space of MRL poses significant challenges to benchmarking, and existing MRL frameworks face limitations in flexibility and scope. To address these challenges, avoid repetitive coding efforts, and ensure fair comparison of models, we introduce FlexMol, a comprehensive toolkit designed to facilitate the construction and evaluation of diverse model architectures across various datasets and performance metrics. FlexMol offers a robust suite of preset model components, including 16 drug encoders, 13 protein sequence encoders, 9 protein structure encoders, and 7 interaction layers. With its easy-to-use API and flexibility, FlexMol supports the dynamic construction of over 70, 000 distinct combinations of model architectures. Additionally, we provide detailed benchmark results and code examples to demonstrate FlexMol's effectiveness in simplifying and standardizing MRL model development and comparison.
Why Deep Models Often cannot Beat Non-deep Counterparts on Molecular Property Prediction?
Xia, Jun, Zhang, Lecheng, Zhu, Xiao, Li, Stan Z.
Specifically, the Multi-Layer Perceptron (MLP) could be applied to Molecular property prediction (MPP) is a crucial computed or handcrafted molecular fingerprints; Sequencebased task in the drug discovery pipeline, which has recently neural architectures including Recurrent Neural Networks gained considerable attention thanks to advances (RNNs) (Medsker & Jain, 1999), 1D Convolutional in deep neural networks. However, recent Neural Networks (1D CNNs) (Gu et al., 2018), and Transformers research has revealed that deep models struggle (Honda et al., 2019; Rong et al., 2020) are exploited to beat traditional non-deep ones on MPP. In this to encode molecules represented in Simplified Molecular-study, we benchmark 12 representative models Input Line-Entry System (SMILES) strings (Weininger (3 non-deep models and 9 deep models) on 14 et al., 1989). Later, it is argued that molecules can be naturally molecule datasets. Through the most comprehensive represented in graph structures with atoms as nodes and study to date, we make the following key observations: bonds as edges.