Goto

Collaborating Authors

 Cross Validation


Arlot , Celisse : A survey of cross-validation procedures for model selection

@machinelearnbot

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its (apparent) universality. Many results exist on model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.


Simple one-pass algorithm for penalized linear regression with cross-validation on MapReduce

arXiv.org Machine Learning

In this paper, we propose a one-pass algorithm on MapReduce for penalized linear regression \[f_\lambda(\alpha, \beta) = \|Y - \alpha\mathbf{1} - X\beta\|_2^2 + p_{\lambda}(\beta)\] where $\alpha$ is the intercept which can be omitted depending on application; $\beta$ is the coefficients and $p_{\lambda}$ is the penalized function with penalizing parameter $\lambda$. $f_\lambda(\alpha, \beta)$ includes interesting classes such as Lasso, Ridge regression and Elastic-net. Compared to latest iterative distributed algorithms requiring multiple MapReduce jobs, our algorithm achieves huge performance improvement; moreover, our algorithm is exact compared to the approximate algorithms such as parallel stochastic gradient decent. Moreover, what our algorithm distinguishes with others is that it trains the model with cross validation to choose optimal $\lambda$ instead of user specified one. Key words: penalized linear regression, lasso, elastic-net, ridge, MapReduce


Misleading modelling: overfitting, cross-validation, and the bias-variance trade-off

#artificialintelligence

This idea of building generalizable models is the motivation behind splitting your dataset into a training set (on which models can be trained) and a test set (which is held out until the very end of your analysis, and provides an accurate measure of model performance).


Computing AIC for black-box models using Generalised Degrees of Freedom: a comparison with cross-validation

arXiv.org Machine Learning

Generalised Degrees of Freedom (GDF), as defined by Ye (1998 JASA 93:120-131), represent the sensitivity of model fits to perturbations of the data. As such they can be computed for any statistical model, making it possible, in principle, to derive the number of parameters in machine-learning approaches. Defined originally for normally distributed data only, we here investigate the potential of this approach for Bernoulli-data. GDF-values for models of simulated and real data are compared to model complexity-estimates from cross-validation. Similarly, we computed GDF-based AICc for randomForest, neural networks and boosted regression trees and demonstrated its similarity to cross-validation. GDF-estimates for binary data were unstable and inconsistently sensitive to the number of data points perturbed simultaneously, while at the same time being extremely computer-intensive in their calculation. Repeated 10-fold cross-validation was more robust, based on fewer assumptions and faster to compute. Our findings suggest that the GDF-approach does not readily transfer to Bernoulli data and a wider range of regression approaches.


Fast Cross-Validation via Sequential Testing

arXiv.org Machine Learning

With the increasing size of today's data sets, finding the right parameter configuration in model selection via cross-validation can be an extremely time-consuming task. In this paper we propose an improved cross-validation procedure which uses nonparametric testing coupled with sequential analysis to determine the best parameter set on linearly increasing subsets of the data. By eliminating underperforming candidates quickly and keeping promising candidates as long as possible, the method speeds up the computation while preserving the capability of the full cross-validation. Theoretical considerations underline the statistical power of our procedure. The experimental evaluation shows that our method reduces the computation time by a factor of up to 120 compared to a full cross-validation with a negligible impact on the accuracy.


Regularization Path of Cross-Validation Error Lower Bounds

Neural Information Processing Systems

Careful tuning of a regularization parameter is indispensable in many machine learning tasks because it has a significant impact on generalization performances.Nevertheless, current practice of regularization parameter tuning is more of an art than a science, e.g., it is hard to tell how many grid-points would be needed in cross-validation (CV) for obtaining a solution with sufficiently small CV error.In this paper we propose a novel framework for computing a lower bound of the CV errors as a function of the regularization parameter, which we call regularization path of CV error lower bounds.The proposed framework can be used for providing a theoretical approximation guarantee on a set of solutions in the sense that how far the CV error of the current best solution could be away from best possible CV error in the entire range of the regularization parameters.We demonstrate through numerical experiments that a theoretically guaranteed a choice of regularization parameter in the above sense is possible with reasonable computational costs.


Cross-validation of matching correlation analysis by resampling matching weights

arXiv.org Machine Learning

The strength of association between a pair of data vectors is represented by a nonnegative real number, called matching weight. For dimensionality reduction, we consider a linear transformation of data vectors, and define a matching error as the weighted sum of squared distances between transformed vectors with respect to the matching weights. Given data vectors and matching weights, the optimal linear transformation minimizing the matching error is solved by the spectral graph embedding of Yan et al. (2007). This method is a generalization of the canonical correlation analysis, and will be called as matching correlation analysis (MCA). In this paper, we consider a novel sampling scheme where the observed matching weights are randomly sampled from underlying true matching weights with small probability, whereas the data vectors are treated as constants. We then investigate a cross-validation by resampling the matching weights. Our asymptotic theory shows that the cross-validation, if rescaled properly, computes an unbiased estimate of the matching error with respect to the true matching weights. Existing ideas of cross-validation for resampling data vectors, instead of resampling matching weights, are not applicable here. MCA can be used for data vectors from multiple domains with different dimensions via an embarrassingly simple idea of coding the data vectors. This method will be called as cross-domain matching correlation analysis (CDMCA), and an interesting connection to the classical associative memory model of neural networks is also discussed.


Fast Cross-Validation for Incremental Learning

arXiv.org Machine Learning

Cross-validation (CV) is one of the main tools for performance estimation and parameter tuning in machine learning. The general recipe for computing CV estimate is to run a learning algorithm separately for each CV fold, a computationally expensive process. In this paper, we propose a new approach to reduce the computational burden of CV-based performance estimation. As opposed to all previous attempts, which are specific to a particular learning model or problem domain, we propose a general method applicable to a large class of incremental learning algorithms, which are uniquely fitted to big data problems. In particular, our method applies to a wide range of supervised and unsupervised learning tasks with different performance criteria, as long as the base learning algorithm is incremental. We show that the running time of the algorithm scales logarithmically, rather than linearly, in the number of CV folds. Furthermore, the algorithm has favorable properties for parallel and distributed implementation. Experiments with state-of-the-art incremental learning algorithms confirm the practicality of the proposed method.


Regularization Path of Cross-Validation Error Lower Bounds

arXiv.org Machine Learning

Careful tuning of a regularization parameter is indispensable in many machine learning tasks because it has a significant impact on generalization performances. Nevertheless, current practice of regularization parameter tuning is more of an art than a science, e.g., it is hard to tell how many grid-points would be needed in cross-validation (CV) for obtaining a solution with sufficiently small CV error. In this paper we propose a novel framework for computing a lower bound of the CV errors as a function of the regularization parameter, which we call regularization path of CV error lower bounds. The proposed framework can be used for providing a theoretical approximation guarantee on a set of solutions in the sense that how far the CV error of the current best solution could be away from best possible CV error in the entire range of the regularization parameters. We demonstrate through numerical experiments that a theoretically guaranteed a choice of regularization parameter in the above sense is possible with reasonable computational costs.


Bayesian Cross Validation and WAIC for Predictive Prior Design in Regular Asymptotic Theory

arXiv.org Machine Learning

Prior design is one of the most important problems in both statistics and machine learning. The cross validation (CV) and the widely applicable information criterion (WAIC) are predictive measures of the Bayesian estimation, however, it has been difficult to apply them to find the optimal prior because their mathematical properties in prior evaluation have been unknown and the region of the hyperparameters is too wide to be examined. In this paper, we derive a new formula by which the theoretical relation among CV, WAIC, and the generalization loss is clarified and the optimal hyperparameter can be directly found. By the formula, three facts are clarified about predictive prior design. Firstly, CV and WAIC have the same second order asymptotic expansion, hence they are asymptotically equivalent to each other as the optimizer of the hyperparameter. Secondly, the hyperparameter which minimizes CV or WAIC makes the average generalization loss to be minimized asymptotically but does not the random generalization loss. And lastly, by using the mathematical relation between priors, the variances of the optimized hyperparameters by CV and WAIC are made smaller with small computational costs. Also we show that the optimized hyperparameter by DIC or the marginal likelihood does not minimize the average or random generalization loss in general.