The out-of-sample $R^2$: estimation and inference

Hawinkel, Stijn, Waegeman, Willem, Maere, Steven

Feb-10-2023–arXiv.org Machine Learning

Out-of-sample prediction is the acid test of predictive models, yet an independent test dataset is often not available for assessment of the prediction error. For this reason, out-of-sample performance is commonly estimated using data splitting algorithms such as cross-validation or the bootstrap. For quantitative outcomes, the ratio of variance explained to total variance can be summarized by the coefficient of determination or in-sample $R^2$, which is easy to interpret and to compare across different outcome variables. As opposed to the in-sample $R^2$, the out-of-sample $R^2$ has not been well defined and the variability on the out-of-sample $\hat{R}^2$ has been largely ignored. Usually only its point estimate is reported, hampering formal comparison of predictability of different outcome variables. Here we explicitly define the out-of-sample $R^2$ as a comparison of two predictive models, provide an unbiased estimator and exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for the $\hat{R}^2$. The performance of the estimators for the $R^2$ and its standard error are investigated in a simulation study. We demonstrate our new method by constructing confidence intervals and comparing models for prediction of quantitative $\text{Brassica napus}$ and $\text{Zea mays}$ phenotypes based on gene expression data.

artificial intelligence, machine learning, modeling & simulation, (18 more...)

arXiv.org Machine Learning

Feb-10-2023

arXiv.org PDF

Add feedback

Country:
- Europe
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
  - Portugal > Braga
    - Braga (0.04)
  - Germany > Lower Saxony
    - Gottingen (0.04)

Genre:
- Research Report > Experimental Study (0.95)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology:
- Information Technology
  - Modeling & Simulation (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning (1.00)
    - Performance Analysis (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found