A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection

Sabourin, Jeremy, Valdar, William, Nobel, Andrew

arXiv.org Machine Learning 

The analysis of high dimensional data, in which the number of measured predictors is large and can exceed the number of samples, is an important and common problem in statistical applications. When samples are accompanied by a real or categorical response, data analysis typically includes model fitting with the aim of doing prediction or variable selection, or both. The goal of prediction is to derive a rule capable of accurately predicting the response of a new, unlabeled sample. The goal of variable selection is to select a (small) subset of the measured predictors whose individual or coordinated activity is significantly related to the response. In both cases, it is common to assume that the observed data arise from an underlying model that is sparse, in the sense that only a small subset of the predictors are related to the response. Whether sparsity is assumed, or viewed as a desirable feature of a model, analysis of high dimensional data is often carried out by penalized methods that produce models in which a relatively small subset of the available predictors are included. Popular penalized methods include the LASSO (Tibshirani, 1996), its numerous variations, and SCAD (Fan and Li, 2001). In what follows, we focus our attention on the LASSO. The LASSO and its variants require specification of a penalty/tuning parameter that controls the tradeoff between model fit and model size.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found