Fast Leave-One-Out Approximation from Fragment-Target Prevalence Vectors (molFTP) : From Dummy Masking to Key-LOO for Leakage-Free Feature Construction
–arXiv.org Artificial Intelligence
A fundamental question for users of predictive models is: how good is the training data (1)? One way to approach this is by delineating the model's applicability domain. A second safeguard is to prevent data leakage (2), which motivates deduplication and proper validation protocols (3) . In practice, it is standard to use cross-validation or time-series split such as SIMPD (4) . Beyond sample-level leakage (molecules crossing folds), we must also consider feature leakage (when features inadvertently encode information about held-out molecules) (2) . We return to this point in the related Work section.
arXiv.org Artificial Intelligence
Oct-8-2025
- Country:
- Europe > Greece
- North America > United States
- California > San Francisco County
- San Francisco (0.14)
- Massachusetts > Suffolk County
- Boston (0.04)
- New York (0.04)
- California > San Francisco County
- Genre:
- Research Report (0.54)
- Industry:
- Health & Medicine (0.94)
- Technology: