Missing Data Imputation for Supervised Learning

Aug-6-2018–arXiv.org Machine Learning

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with different levels of additional missing-data perturbation. We show imputation methods can increase predictive accuracy in the presence of missing-data perturbation, which can actually improve prediction accuracy by regularizing the classifier. We achieve the state-of-the-art on the Adult dataset with missing-data perturbation and k-nearest-neighbors (k-NN) imputation.

artificial intelligence, data quality, machine learning, (17 more...)

arXiv.org Machine Learning

Aug-6-2018

arXiv.org PDF

Add feedback

Country:
- Africa > South Africa (0.05)
- South America
  - Argentina (0.04)
  - Chile > Santiago Metropolitan Region
    - Santiago Province > Santiago (0.04)
- North America > United States
  - California
    - Alameda County > Berkeley (0.14)
    - Orange County > Irvine (0.04)

Genre:
- Research Report (1.00)

Industry:
- Government > Regional Government (0.47)

Technology:
- Information Technology
  - Data Science > Data Quality (1.00)
  - Artificial Intelligence > Machine Learning
    - Performance Analysis > Accuracy (0.70)
    - Statistical Learning > Nearest Neighbor Methods (0.55)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found