Pötscher, Benedikt M., Schneider, Ulrike

We study the distribution of the adaptive LASSO estimator (Zou (2006)) in finite samples as well as in the large-sample limit. The large-sample distributions are derived both for the case where the adaptive LASSO estimator is tuned to perform conservative model selection as well as for the case where the tuning results in consistent model selection. We show that the finite-sample as well as the large-sample distributions are typically highly non-normal, regardless of the choice of the tuning parameter. The uniform convergence rate is also obtained, and is shown to be slower than $n^{-1/2}$ in case the estimator is tuned to perform consistent model selection. In particular, these results question the statistical relevance of the `oracle' property of the adaptive LASSO estimator established in Zou (2006). Moreover, we also provide an impossibility result regarding the estimation of the distribution function of the adaptive LASSO estimator.The theoretical results, which are obtained for a regression model with orthogonal design, are complemented by a Monte Carlo study using non-orthogonal regressors.

Basu, Pallavi, Feng, Yang, Lv, Jinchi

Model selection is indispensable to high-dimensional sparse modeling in selecting the best set of covariates among a sequence of candidate models. Most existing work assumes implicitly that the model is correctly specified or of fixed dimensions. Yet model misspecification and high dimensionality are common in real applications. In this paper, we investigate two classical Kullback-Leibler divergence and Bayesian principles of model selection in the setting of high-dimensional misspecified models. Asymptotic expansions of these principles reveal that the effect of model misspecification is crucial and should be taken into account, leading to the generalized AIC and generalized BIC in high dimensions. With a natural choice of prior probabilities, we suggest the generalized BIC with prior probability which involves a logarithmic factor of the dimensionality in penalizing model complexity. We further establish the consistency of the covariance contrast matrix estimator in a general setting. Our results and new method are supported by numerical studies.

Zhang, Michael Minyi, Lam, Henry, Lin, Lizhen

Effective and accurate model selection is an important problem in modern data analysis. One of the major challenges is the computational burden required to handle large data sets that cannot be stored or processed on one machine. Another challenge one may encounter is the presence of outliers and contaminations that damage the inference quality. The parallel "divide and conquer" model selection strategy divides the observations of the full data set into roughly equal subsets and perform inference and model selection independently on each subset. After local subset inference, this method aggregates the posterior model probabilities or other model/variable selection criteria to obtain a final model by using the notion of geometric median. This approach leads to improved concentration in finding the "correct" model and model parameters and also is provably robust to outliers and data contamination.

Model selection is an important task in machine learning and data mining. When using the holdout testing method to do model selection, a consensus in the machine learning community is that the same model selection goal should be used to identify the best model based on available data. However, following the preliminary work of (Rosset 2004), we show that this is, in general, not true under highly uncertain situations where only very limited data are available. We thoroughly investigate model selection abilities of different measures under highly uncertain situations as we vary model selection goals, learning algorithms and class distributions. The experimental results show that a measure's model selection ability is relatively stable to the model selection goals and class distributions. However, different learning algorithms call for different measures for model selection. For learning algorithms of SVM and KNN, generally the measures of RMS, SAUC, MXE perform the best. For learning algorithms of decision trees and naive Bayes, generally the measures of RMS, SAUC, MXE, AUC, APR have the best performance.

So I understand that variable selection is a part of model selection. But what exactly does model selection consist of? I ask this because I am reading an article Burnham & Anderson: AIC vs BIC where they talk about AIC and BIC in model selection. Reading this article I realize I have been thinking of'model selection' as'variable selection' (ref. An excerpt from the article where they talk about 12 models with increasing degrees of "generality" and these models show "tapering effects" (Figure 1) when KL-Information is plotted against the 12 models: DIFFERENT PHILOSOPHIES AND TARGET MODELS ... Despite that the target of BIC is a more general model than the target model for AIC, the model most often selected here by BIC will be less general than Model 7 unless n is very large.