2 data-wrangling techniques for better machine learning
It is rare that acquired data contains values for all features of all instances. Values can go missing for a number of reasons -- for example, through a faulty sensor, software bug, mapping issues from the source system or being left intentionally blank in a survey. To be able to use such a data set for model training, since machine learning algorithms require a value to work with, a quick and easy solution is to delete either the entire instances (rows) with missing values or delete the feature (column). However, doing so negatively impacts model training as deleting instances not only decreases the amount of training data, but also creates an imbalance in the example training data. In addition, removing features altogether affects the predictive power of the resulting model (Figure 1).
May-13-2021, 08:30:09 GMT
- Technology: