e0ed6d6c2ec6df05f929b8a67b78513a-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing Systems 

In this section, we propose the detailed information during our benchmark and dataset construction821 process, including the data source description, dataset composition, filtering strategies, and the822 rationale for dataset construction. Chemical reaction data are separately collected from patent databases, including USPTO [19], Pista-828 chio [37], and Reaxys [8]. For reaction mechanism annotation, we followed the processing pipeline829 described in [26].830 A.2 Dataset Composition and Filtering Strategies831 Molecular Samples (25% of Benchmark): Although the ZINC database contains 250,000832 molecules, we observed that its molecular weight distribution is relatively concentrated. To en-833 sure diversity, we carefully selected molecules from PubChem, ChEMBL, and ZINC based on834 molecular weight and structural complexity.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found