Investigating Mask-aware Prototype Learning for Tabular Anomaly Detection

Lu, Ruiying, Liu, Jinhan, Du, Chuan, Guo, Dandan

arXiv.org Artificial Intelligence 

--T abular anomaly detection, which aims at identifying deviant samples, has been crucial in a variety of real-world applications, such as medical disease identification, financial fraud detection, intrusion monitoring, etc. Although recent deep learning-based methods have achieved competitive performances, these methods suffer from representation entanglement and the lack of global correlation modeling, which hinders anomaly detection performance. T o tackle the problem, we incorporate mask modeling and prototype learning into tabular anomaly detection. The core idea is to design learnable masks by disentangled representation learning within a projection space and extracting normal dependencies as explicit global prototypes. Specifically, the overall model involves two parts: (i) During encoding, we perform mask modeling in both the data space and projection space with orthogonal basis vectors for learning shared disentangled normal patterns; (ii) During decoding, we decode multiple masked representations in parallel for reconstruction and learn association prototypes to extract normal characteristic correlations. Our proposal derives from a distribution-matching perspective, where both projection space learning and association prototype learning are formulated as optimal transport problems, and the calibration distances are utilized to refine the anomaly scores. Quantitative and qualitative experiments on 20 tabular benchmarks demonstrate the effectiveness and interpretability of our model. Tabular data, often structured as tables in relational databases with rows signifying individual data samples and columns representing feature variables, have become indispensable across diverse real-world domains including intrusion detection in cybersecurity [1], [2], engineering [3], finance [4] etc. Tabular anomaly detection (AD), which endeavors to identify samples that diverge from a pre-defined notion of normality, playing a pivotal role in diverse scientific and industrial contexts, such as medical disease identification [5], financial fraud detection [6], cybersecurity intrusion monitoring [7], [8], and astronomy [9]. This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 62306125, the Natural Science Basic Research Plan in Shaanxi Province of China under Grant [2024JC-YBQN-0661], and the Nanning Scientific Research and Technological Development Project (20231042).