The Lasso under Heteroscedasticity

Jia, Jinzhu, Rohe, Karl, Yu, Bin

arXiv.org Machine Learning 

Preprint 1 The Lasso under Heteroscedasticity Jinzhu Jia 1, Karl Rohe 1 and Bin Yu 1, 2 Department of Statistics 1 and Department of EECS 2 University of California, Berkeley Abstract: The performance of the Lasso is well understood under the assumptions of the standard linear model with homoscedastic noise. However, in several applications, the standard model does not describe the important features of the data. This paper examines how the Lasso performs on a nonstandard model that is motivated by medical imaging applications. Like all heteroscedas-tic models, the noise terms in this Poisson-like model are not independent of the design matrix. More specifically, this paper studies the sign consistency of the Lasso under a sparse Poisson-like model. In addition to studying sufficient conditions for the sign consistency of the Lasso estimate, this paper also gives necessary conditions for sign consistency. Both sets of conditions are comparable to results for the homoscedastic model, showing that when a measure of the signal to noise ratio is large, the Lasso performs well on both Poisson-like data and homoscedastic data. Simulations reveal that the Lasso performs equally well in terms of model selection performance on both Poisson-like data and homoscedastic data (with properly scaled noise variance), across a range of parameterizations. Taken as a whole, these results suggest that the Lasso is robust to the Poisson-like heteroscedastic noise. Key words and phrases: Lasso, Poisson-like Model, Sign Consistency, Heteroscedas-ticity 1 Introduction The Lasso (Tibshirani, 1996) is widely used in high dimensional regression for variable selection. Its model selection performance has been well studied under a standard sparse and homoskedastic regression model. Several researchers have shown that under sparsity and regularity conditions, the Lasso can select the true model asymptotically even whenp n (Donoho et al., 2006; Meinshausen arXiv:1011.1026v1 To define the Lasso estimate, suppose the observed data are independent pairs { (x i,Y i)} R p R for i 1, 2,...,n following the linear regression model Y i x T i β i, (1) where x T i is a row vector representing the predictors for thei th observation,Y i is the correspondingi th response variable, i's are independent and mean zero noise terms, andβ R p . Let Y (Y 1,...,Y n)T and ( 1, 2,..., n)T R n . The Lasso estimate (Tibshirani, 1996) is then defined as the solution to a penalized least squares problem (with regularization parameterλ): ˆ β (λ) arg min β 1 2 n ‖Y X β‖ 2 2 λ‖β‖ 1, (2) where for some vectorx R k,‖ x ‖ r ( k i 1 x i r) 1/r .

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found