A Probabilistic Transformation of Distance-Based Outliers
Muhr, David, Affenzeller, Michael, Küng, Josef
–arXiv.org Artificial Intelligence
The scores of distance-based outlier detection methods are difficult to interpret, making it challenging to determine a cut-off threshold between normal and outlier data points without additional context. We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates. The transformation is ranking-stable and increases the contrast between normal and outlier data points. Determining distance relationships between data points is necessary to identify the nearest-neighbor relationships in the data, yet, most of the computed distances are typically discarded. We show that the distances to other data points can be used to model distance probability distributions and, subsequently, use the distributions to turn distance-based outlier scores into outlier probabilities. Our experiments show that the probabilistic transformation does not impact detection performance over numerous tabular and image benchmark datasets but results in interpretable outlier scores with increased contrast between normal and outlier samples. Our work generalizes to a wide range of distance-based outlier detection methods, and because existing distance computations are used, it adds no significant computational overhead.
arXiv.org Artificial Intelligence
Jul-18-2023
- Country:
- North America > United States
- Texas > Dallas County
- Dallas (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- New York > New York County
- New York City (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California
- San Francisco County > San Francisco (0.14)
- San Diego County > San Diego (0.04)
- Texas > Dallas County
- Europe
- United Kingdom > England (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Netherlands > South Holland
- Dordrecht (0.04)
- Austria > Upper Austria
- Linz (0.04)
- Asia
- North America > United States
- Genre:
- Research Report (0.82)