Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Theoretically-Grounded and Practical Framework with Synthetic Anomalies

Lau, Matthew, Zhou, Tian-Yi, Yuan, Xiangchi, Chen, Jizhou, Lee, Wenke, Huo, Xiaoming

Jun-18-2025–arXiv.org Machine Learning

Anomaly detection (AD) is a critical task across domains such as cybersecurity and healthcare. In the unsupervised setting, an effective and theoretically-grounded principle is to train classifiers to distinguish normal data from (synthetic) anomalies. We extend this principle to semi-supervised AD, where training data also include a limited labeled subset of anomalies possibly present in test time. We propose a theoretically-grounded and empirically effective framework for semi-supervised AD that combines known and synthetic anomalies during training. To analyze semi-supervised AD, we introduce the first mathematical formulation of semi-supervised AD, which generalizes unsupervised AD. Here, we show that synthetic anomalies enable (i) better anomaly modeling in low-density regions and (ii) optimal convergence guarantees for neural network classifiers -- the first theoretical result for semi-supervised AD. We empirically validate our framework on five diverse benchmarks, observing consistent performance gains. These improvements also extend beyond our theoretical framework to other classification-based AD methods, validating the generalizability of the synthetic anomaly principle in AD.

anomaly, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

Jun-18-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East > Jordan (0.04)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government > Military
  - Cyberwarfare (0.34)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Anomaly Detection (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found