Test Set Sizing for the Ridge Regression

Dubbs, Alexander

arXiv.org Machine Learning 

The question of how to divide one's data into a training set and a test s et has long been of theoretical and practical interest to data scient ists. While many results have been proved bounding different types of error in the case of broad classes of models, no precise results have been found for an y machine learning models using philosophically appealing metrics of success that do not depend on artificial tuning parameters. This paper finds the tr ain/test split for the ridge regression to high accuracy using a two-term asy mptotic formula independent of its tuning parameter, α using the Integrity Metric (IM) introduced for the plain vanilla linear regression by the author in [ 2 ]. The IM measures the degree to which the measured model error differs fr om the true 1 2 Test Set Sizing for the Ridge Regression error of the model, and this quantity should always be minimized to gain an honest assessment of a model's performance. We pick the number o f points p in the training set to minimize the IM. Note that we do not pick p to maximize the measured model accuracy, since then we would derive an asses sment of the model's ability that is not truthful. Our main result is: Theorem 6. Let X be a m n matrix of normals with independent rows with covariance Σ .

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found