AITopics

1610.07733

Country: Asia > Japan > Honshū > Kantō (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

@machinelearnbotOct-3-2016, 17:35:07 GMT

Model evaluation & selection Part III, cross-validation and the bias-variance trade-off • /r/MachineLearning

artificial intelligence, evaluation & selection part iii, machine learning, (2 more...)

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Homrighausen, Darren, McDonald, Daniel J.

Risk-consistency of cross-validation with lasso-type procedures

arXiv.org Machine LearningJun-21-2016

The lasso and related sparsity inducing algorithms have been the target of substantial theoretical and applied research. Correspondingly, many results are known about their behavior for a fixed or optimally chosen tuning parameter specified up to unknown constants. In practice, however, this oracle tuning parameter is inaccessible so one must use the data to select one. Common statistical practice is to use a variant of cross-validation for this task. However, little is known about the theoretical properties of the resulting predictions with such data-dependent methods. We consider the high-dimensional setting with random design wherein the number of predictors $p$ grows with the number of observations $n$. Under typical assumptions on the data generating process, similar to those in the literature, we recover oracle rates up to a log factor when choosing the tuning parameter with cross-validation. Under weaker conditions, when the true model is not necessarily linear, we show that the lasso remains risk consistent relative to its linear oracle. We also generalize these results to the group lasso and square-root lasso and investigate the predictive and model selection performance of cross-validation via simulation.

artificial intelligence, estimator, machine learning, (17 more...)

1308.081

Country:

North America > United States > Indiana (0.14)
North America > United States > Colorado (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

@machinelearnbotJun-16-2016, 11:41:05 GMT

Cross-validation in R: a do-it-yourself and a black box approach

In my previous post, we saw that R-squared can lead to a misleading interpretation of the quality of our regression fit, in terms of prediction power. One thing that R-squared offers no protection against is overfitting. On the other hand, cross validation, by allowing us to have cases in our testing set that are different from the cases in our training set, inherently offers protection against overfittting. In this type of validation, one case in our data set is used as the test set, while the remaining cases are used as the training set. We iterate through the data set, until all cases have served as the test set.

air transportation, artificial intelligence, machine learning, (11 more...)

Industry: Transportation > Air (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.69)

#artificialintelligenceJun-5-2016, 23:39:23 GMT

Bootstrap and cross-validation for evaluating modelling strategies

I've been re-reading Frank Harrell's Regression Modelling Strategies, a must read for anyone who ever fits a regression model, although be prepared - depending on your background, you might get 30 pages in and suddenly become convinced you've been doing nearly everything wrong before, which can be disturbing. I wanted to evaluate three simple modelling strategies in dealing with data with many variables. Using data with 54 variables on 1,785 area units from New Zealand's 2013 census, I'm looking to predict median income on the basis of the other 53 variables. The features are all continuous and are variables like "mean number of bedrooms", "proportion of individuals with no religion" and "proportion of individuals who are smokers". None of these is exactly what I would use for real, but they serve the purpose of setting up a competition of strategies that I can test with a variety of model validation techniques.

artificial intelligence, machine learning, variable selection, (11 more...)

#artificialintelligence

Country: Oceania > New Zealand (0.27)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.45)

arXiv.org Machine LearningMay-23-2016

Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models

Vehtari, Aki, Mononen, Tommi, Tolvanen, Ville, Sivula, Tuomas, Winther, Ole

The future predictive performance of a Bayesian model can be estimated using Bayesian cross-validation. In this article, we consider Gaussian latent variable models where the integration over the latent values is approximated using the Laplace method or expectation propagation (EP). We study the properties of several Bayesian leave-one-out (LOO) cross-validation approximations that in most cases can be computed with a small additional cost after forming the posterior approximation given the full data. Our main objective is to assess the accuracy of the approximative LOO cross-validation estimators. That is, for each method (Laplace and EP) we compare the approximate fast computation with the exact brute force LOO computation. Secondarily, we evaluate the accuracy of the Laplace and EP approximations themselves against a ground truth established through extensive Markov chain Monte Carlo simulation. Our empirical results show that the approach based upon a Gaussian approximation to the LOO marginal distribution (the so-called cavity distribution) gives the most accurate and reliable results among the fast methods.

approximation, bayesian inference, health & medicine, (20 more...)

1412.7461

Country:

North America > United States (0.14)
Europe > Finland (0.14)
Europe > Denmark (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

@machinelearnbotMay-15-2016, 21:10:46 GMT

How to test classifier better than chance using k-fold cross-validation? • /r/MachineLearning

I have 400 units and 10 groups, and I'm classifying the units' group membership using a discriminant function analysis or linear discriminant analysis. During cross-validation, I want to test that my solution is doing a better job at classifying them than chance (10%). I can get an error rate, but don't know how to statistically compare. With the hold-out approach, I can test it using Press' Q statistic or Maximum Chance Criterion. But with k-fold I don't think I can use this approach.

artificial intelligence, machinelearning, test classifier, (1 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.67)

Zliobaite, Indre, Tatti, Nikolaj

A note on adjusting $R^2$ for using with cross-validation

arXiv.org Machine LearningMay-5-2016

We show how to adjust the coefficient of determination ($R^2$) when used for measuring predictive accuracy via leave-one-out cross-validation.

artificial intelligence, machine learning, predictor, (14 more...)

1605.01703

Country: Europe > Finland (0.19)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.67)

@machinelearnbotMay-4-2016, 05:55:14 GMT

Comparing and Computing Performance Metrics in Cross-Validation – Imbalanced Class Problems and 3 Different Ways to Compute the F1 Score • /r/MachineLearning

artificial intelligence, imbalanced class problem and 3, natural language, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.83)

#artificialintelligenceApr-29-2016, 20:30:54 GMT

How do you know if your model is going to work? Part 4: Cross-validation techniques

In this article we conclude our four part series on basic model testing. When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it's better than the models that you rejected? In this concluding Part 4 of our four part mini-series "How do you know if your model is going to work?" we demonstrate cross-validation techniques. Cross validation techniques attempt to improve statistical efficiency by repeatedly splitting data into train and test and re-performing model fit and model evaluation.

artificial intelligence, machine learning, test train split, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.95)