Explainable outlier detection through decision tree conditioning

Cortes, David

arXiv.org Machine Learning 

This work describes an outlier-detection procedure that aims at pr oducing explanations for why an observation/point can be considered to be anomalous, w hich are obtained by finding smart conditional distributions of a given variable under which the anomalous observation/point in question would fall according to the conditions, b ut for which its value on a variable of interest would not match with the distribution of the o ther observations. These conditional distributions are obtained by splitting/separatin g/conditioning observations according to some other variable(s) in such a way that the in formation gain ([8]) in the variable of interest obtained by splitting the observations (as signing to two or more groups) is maximized, in a similar way as decision tree algorithms such as CART ([3]) or C5.0 ([8]), which ensure that the conditions that are set for a variable ar e not spurious, but rather related to the multivariate distribution of the data, and the anomalous value put into context by presenting key information about the variable's distribution among the rest of the observations. An example explainable outlier is sketc hed below: row [2230] - suspicious column: [T3] - suspicious vale: [10.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found