Test Set Sizing for the Ridge Regression

Apr-27-2025–arXiv.org Machine Learning

The question of how to divide one's data into a training set and a test s et has long been of theoretical and practical interest to data scient ists. While many results have been proved bounding different types of error in the case of broad classes of models, no precise results have been found for an y machine learning models using philosophically appealing metrics of success that do not depend on artificial tuning parameters. This paper finds the tr ain/test split for the ridge regression to high accuracy using a two-term asy mptotic formula independent of its tuning parameter, α using the Integrity Metric (IM) introduced for the plain vanilla linear regression by the author in [ 2 ]. The IM measures the degree to which the measured model error differs fr om the true 1 2 Test Set Sizing for the Ridge Regression error of the model, and this quantity should always be minimized to gain an honest assessment of a model's performance. We pick the number o f points p in the training set to minimize the IM. Note that we do not pick p to maximize the measured model accuracy, since then we would derive an asses sment of the model's ability that is not truthful. Our main result is: Theorem 6. Let X be a m n matrix of normals with independent rows with covariance Σ .

artificial intelligence, machine learning, test set sizing, (15 more...)

arXiv.org Machine Learning

Apr-27-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
- Europe
  - Switzerland (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.84)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found