Goto

Collaborating Authors

Imbalanced Classification with the Adult Income Dataset

#artificialintelligence

Many binary classification tasks do not have an equal number of examples from each class, e.g. the class distribution is skewed or imbalanced. A popular example is the adult income dataset that involves predicting personal income levels as above or below $50,000 per year based on personal details such as relationship and education level. There are many more cases of incomes less than $50K than above $50K, although the skew is not severe. This means that techniques for imbalanced classification can be used whilst model performance can still be reported using classification accuracy, as is used with balanced classification problems. In this tutorial, you will discover how to develop and evaluate a model for the imbalanced adult income classification dataset.


kNN Imputation for Missing Values in Machine Learning

#artificialintelligence

Datasets may have missing values, and this can cause problems for many machine learning algorithms. As such, it is good practice to identify and replace missing values for each column in your input data prior to modeling your prediction task. This is called missing data imputation, or imputing for short. A popular approach to missing data imputation is to use a model to predict the missing values. This requires a model to be created for each input variable that has missing values.


Develop a Model for the Imbalanced Classification of Good and Bad Credit

#artificialintelligence

Misclassification errors on the minority class are more important than other types of prediction errors for some imbalanced classification tasks. One example is the problem of classifying bank customers as to whether they should receive a loan or not. Giving a loan to a bad customer marked as a good customer results in a greater cost to the bank than denying a loan to a good customer marked as a bad customer. This requires careful selection of a performance metric that both promotes minimizing misclassification errors in general, and favors minimizing one type of misclassification error over another. The German credit dataset is a standard imbalanced classification dataset that has this property of differing costs to misclassification errors. Models evaluated on this dataset can be evaluated using the Fbeta-Measure that provides a way of both quantifying model performance generally, and captures the requirement that one type of misclassification error is more costly than another. In this tutorial, you will discover how to develop and evaluate a model for the imbalanced German credit classification dataset. Develop an Imbalanced Classification Model to Predict Good and Bad Credit Photo by AL Nieves, some rights reserved.


Develop a Model for the Imbalanced Classification of Good and Bad Credit

#artificialintelligence

Misclassification errors on the minority class are more important than other types of prediction errors for some imbalanced classification tasks. One example is the problem of classifying bank customers as to whether they should receive a loan or not. Giving a loan to a bad customer marked as a good customer results in a greater cost to the bank than denying a loan to a good customer marked as a bad customer. This requires careful selection of a performance metric that both promotes minimizing misclassification errors in general, and favors minimizing one type of misclassification error over another. The German credit dataset is a standard imbalanced classification dataset that has this property of differing costs to misclassification errors. Models evaluated on this dataset can be evaluated using the Fbeta-Measure that provides a way of both quantifying model performance generally, and captures the requirement that one type of misclassification error is more costly than another. In this tutorial, you will discover how to develop and evaluate a model for the imbalanced German credit classification dataset. Develop an Imbalanced Classification Model to Predict Good and Bad Credit Photo by AL Nieves, some rights reserved.


Data Preparation for Gradient Boosting with XGBoost in Python - Machine Learning Mastery

#artificialintelligence

XGBoost is a popular implementation of Gradient Boosting because of its speed and performance. Internally, XGBoost models represent all problems as a regression predictive modeling problem that only takes numerical values as input. If your data is in a different form, it must be prepared into the expected format. In this post you will discover how to prepare your data for using with gradient boosting with the XGBoost library in Python. Data Preparation for Gradient Boosting with XGBoost in Python Photo by Ed Dunens, some rights reserved.