Is it 'always' necessary to treat outliers in a machine learning model?
Outliers is one of those issues we come across almost every day in a machine learning modelling. Wikipedia defines outliers as "an observation point that is distant from other observations." That means, some minority cases in the data set are different from the majority of the data. I would like to classify outlier data in to two main categories: Non-Natural and Natural. The non-natural outliers are those which are caused by measurement errors, wrong data collection or wrong data entry.
Aug-28-2018, 10:49:09 GMT
- Technology: