AITopics | vtreat

Collaborating Authors

vtreat

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

When Cross-Validation is More Powerful than Regularization

#artificialintelligenceNov-12-2019, 23:08:19 GMT

Regularization is a way of avoiding overfit by restricting the magnitude of model coefficients (or in deep learning, node weights). A simple example of regularization is the use of ridge or lasso regression to fit linear models in the presence of collinear variables or (quasi-)separation. The intuition is that smaller coefficients are less sensitive to idiosyncracies in the training data, and hence, less likely to overfit. Cross-validation is a way to safely reuse training data in nested model situations. This includes both the case of setting hyperparameters before fitting a model, and the case of fitting models (let's call them base learners) that are then used as variables in downstream models, as shown in Figure 1.

categorical variable, regularization, training data, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.67)

Add feedback

Encoding categorical variables: one-hot and beyond

#artificialintelligenceApr-15-2017, 19:30:21 GMT

R has "one-hot" encoding hidden in most of its modeling paths. Asking an R user where one-hot encoding is used is like asking a fish where there is water; they can't point to it as it is everywhere. Much of the encoding in R is essentially based on "contrasts" implemented in stats::model.matrix() Note: do not use base::data.matrix() The above mal-coding can be a critical flaw when you are building a model and then later using the model on new data (be it cross-validation data, test data, or future application data). Many R users are not familiar with the above issue as encoding is hidden in model training, and how to encode new data is stored as part of the model.

artificial intelligence, encoding categorical variable, machine learning, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Encoding categorical variables: one-hot and beyond

#artificialintelligenceApr-15-2017, 17:45:33 GMT

artificial intelligence, machine learning, matrix, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

vtreat: prepare data

#artificialintelligenceMar-5-2017, 10:15:50 GMT

This article is on preparing data for modeling in R using vtreat. Suppose we wish to work with some data. Our example task is to train a classification model for credit approval using the ranger implementation of the random forests method. We will take our data from John Ross Quinlan's re-processed "credit approval" dataset hosted at Lichman, M. (2013). For convenience we have copied the data to our working directory here.

artificial intelligence, machine learning, vtreat, (14 more...)

#artificialintelligence

Country: North America > United States > California > Orange County > Irvine (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback