Is AUC the best measure for practical comparison of anomaly detectors?
Škvára, Vít, Pevný, Tomáš, Šmídl, Václav
–arXiv.org Artificial Intelligence
Data Mining and Knowledge Discovery manuscript No. (will be inserted by the editor) Is AUC the best measure for practical comparison of anomaly detectors? Abstract The area under receiver operating characteristics (AUC) is the standard measure for comparison of anomaly detectors. Its advantage is in providing a scalar number that allows a natural ordering and is independent on a threshold, which allows to postpone the choice. In this work, we question whether AUC is a good metric for anomaly detection, or if it gives a false sense of comfort, due to relying on assumptions which are unlikely to hold in practice. Our investigation shows that variations of AUC emphasizing accuracy at low false positive rate seem to be better correlated with the needs of practitioners, but also that we can compare anomaly detectors only in the case when we have representative examples of anomalous samples. This last result is disturbing, as it suggests that in many cases, we should do active or few-show learning instead of pure anomaly detection. While this definition makes sense from the point of view of probability theory, the practitioners are interested only in anomalies of a certain kind. For example, the use of anomaly detection in computer security is motivated by the assumption that attacks are rare, but not every rare event is an attack. Anomaly detection has a long history and there are hundreds of models based on vastly different approaches, such as modifications of the k-nearest neighbors algorithm (Harmeling et al., 2006), random forests Figure 1 Receiver operating characteristic (ROC) curve and the corresponding AUC of a degenerate anomaly detector. FPR / TPR stands for false / true positive rate. Several comparative studies exist - e.g.
arXiv.org Artificial Intelligence
May-8-2023
- Country:
- North America > United States
- Alaska (0.04)
- Europe
- Czechia > Prague (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: