A Problem-Oriented Taxonomy of Evaluation Metrics for Time Series Anomaly Detection

Yang, Kaixiang, Liu, Jiarong, Song, Yupeng, Yang, Shuanghua, Zhou, Yujue

arXiv.org Machine Learning 

Abstract--Time series anomaly detection is widely used in IoT and cyber-physical systems, yet its evaluation remains challenging due to diverse application objectives and heterogeneous metric assumptions. This study introduces a problem-oriented framework that reinterprets existing metrics based on the specific evaluation challenges they are designed to address, rather than their mathematical forms or output structures. We categorize over twenty commonly used metrics into six dimensions: (1) basic accuracy-driven evaluation, (2) timeliness-aware reward mechanisms, (3) tolerance to labeling imprecision, (4) penalties reflecting human-audit cost, (5) robustness against random or inflated scores, and (6) parameter-free comparability for cross-dataset benchmark-ing. Comprehensive experiments are conducted to examine metric behavior under genuine, random, and oracle detection scenarios. By comparing their resulting score distributions, we quantify each metric's discriminative ability--its capability to distinguish meaningful detections from random noise. The results show that while most event-level metrics exhibit strong separability, several widely used metrics (e.g., NAB, Point-Adjust) demonstrate limited resistance to random-score inflation. These findings reveal that metric suitability must be inherently task-dependent and aligned with the operational objectives of IoT applications. The proposed framework offers a unified analytical perspective for understanding existing metrics and provides practical guidance for selecting or developing more context-aware, robust, and fair evaluation methodologies for time series anomaly detection. He emergence of the Internet of Things (IoT) has accelerated digital transformation across numerous domains. Its defining characteristic lies in the large-scale deployment of intelligent and heterogeneous devices--such as sensors, actuators, and RFID systems--that are interconnected via the Internet to enable autonomous communication without human intervention [1]. Currently, more than 12 billion IoT devices are in operation, and this number is projected to reach 125 billion by 2030 [2]. Consequently, the volume of data generated by these devices continues to soar, with an expected total of 79.4 ZB by 2025 [3]. In industrial contexts, the integration of IoT technologies has driven the ongoing Industry 4.0 revolution, emphasizing connectivity, automation, and intelligence. Kaixiang Y ang, Jiarong Liu, Y upeng Song, and Y ujue Zhou are with the School of Artificial Intelligence, Y unnan University, Kunming 650091, China. Shuanghua Y ang is with Beijing Normal University - Hong Kong Baptist University, Zhuhai 519087, China. This work was supported in part by the Y unnan Fundamental Research Projects under Grant 202401AU070151, and in part by the Y unnan Provincial Science and Technology Talent and Platform Plan under Grant 202505AF350053.