AITopics | normalize test data

Collaborating Authors

normalize test data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Noob question: why should we normalize test data with mean and std from training data? • /r/MachineLearning

#artificialintelligenceJun-6-2016, 20:05:45 GMT

Nah. It's only really required for things like Neural Networks where it keeps the gradient descent of features in the space where gradient descent does best, and for Linear/Logistic Regression where it also isn't really required, but makes the weights interpretable as feature importance/contribution to the prediction. For things like Random Forest, which are based on decision trees, they'll find a split anywhere, it doesn't matter how the features are scaled. For stuff like Nearest Neighbours, it can be important, or it can hurt. This is because normalisation is like saying all features are equally important, which isn't necessarily true. It could be the case that you've got spatial information in a rectangular space, and so normalising is favouring the small axis of that rectangle over the other axis.

artificial intelligence, machine learning, normalize test data, (8 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.66)

Add feedback

Noob question: why should we normalize test data with mean and std from training data? • /r/MachineLearning

@machinelearnbotJun-4-2016, 21:40:42 GMT

Well, since both sets are samples from the same distribution, they should ideally have similar means and variances. They obviously won't be identical though, and in this case it makes sense to use the means and variances from the training data, since it's what the model was trained on. The model approximates a mapping from data standardized by the training data's mean and variance, so using the test data's mean and variance would give you inaccurate results.

artificial intelligence, machine learning, training data, (6 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback