Cross-Validation for Correlated Data
Rabinowicz, Assaf, Rosset, Saharon
Datasets with correlation structures are common in modern statistical applications in various fields, such as geostatistics (Goovaerts 1999), genetics (Maddison 1990) and ecology (Roberts et al. 2017). Different modeling methods address the correlation structure differently. Some modeling methods, such as Gaussian process regression (Rasmussen and Williams 2006, GPR) and generalized least squares (Hansen 2007, GLS), utilize explicitly the correlation structure for achieving better prediction accuracy. Other predictive models, like random forest (Breiman 2001, RF), gradient boosting machines (Friedman 2002, GBM) and other machine learning models, do not consider explicitly the correlation structure but are still potentially able to utilize the correlation implicitly. The analysis in this paper mainly focuses on correlation that appears due to latent objects, such as random effects and random fields as appear in generalized linear mixed models (Verbeke 1997, GLMM) and generalized Gaussian process regression (Rasmussen and Williams 2006, GGPR) in clustered, temporal and spatial datasets.
Apr-24-2020
- Country:
- North America > United States
- California (0.04)
- New York > New York County
- New York City (0.04)
- Asia > Middle East
- Jordan (0.04)
- Israel > Tel Aviv District
- Tel Aviv (0.04)
- North America > United States
- Genre:
- Research Report (1.00)