Statistical Inference for Clustering-based Anomaly Detection

Phu, Nguyen Thi Minh, Loc, Duong Tan, Duy, Vo Nguyen Le

Apr-25-2025–arXiv.org Machine Learning

Unsupervised anomaly detection (AD) is a fundamental problem in machine learning and statistics. A popular approach to unsupervised AD is clustering-based detection. However, this method lacks the ability to guarantee the reliability of the detected anomalies. In this paper, we propose SI-CLAD (Statistical Inference for CLustering-based Anomaly Detection), a novel statistical framework for testing the clustering-based AD results. The key strength of SI-CLAD lies in its ability to rigorously control the probability of falsely identifying anomalies, maintaining it below a pre-specified significance level $\alpha$ (e.g., $\alpha = 0.05$). By analyzing the selection mechanism inherent in clustering-based AD and leveraging the Selective Inference (SI) framework, we prove that false detection control is attainable. Moreover, we introduce a strategy to boost the true detection rate, enhancing the overall performance of SI-CLAD. Extensive experiments on synthetic and real-world datasets provide strong empirical support for our theoretical findings, showcasing the superior performance of the proposed method.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

Apr-25-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Wisconsin (0.04)
- Asia
  - Japan (0.04)
  - Vietnam > Hồ Chí Minh City
    - Hồ Chí Minh City (0.04)
  - Middle East > UAE
    - Dubai Emirate > Dubai (0.04)

Genre:
- Research Report > Experimental Study (0.34)

Industry:
- Health & Medicine > Therapeutic Area (0.95)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Anomaly Detection (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning > Clustering (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found