Categorical anomaly detection in heterogeneous data using minimum description length clustering
Cheney, James, Gombau, Xavier, Berrada, Ghita, Benabderrahmane, Sidahmed
–arXiv.org Artificial Intelligence
Two examples of anomaly detection based on MDL have been been proposed for categorical data based on the minimum description studied and shown to perform well: the OC3 algorithm [21] based length (MDL) principle. However, they can be ineffective when on an itemset mining technique called Krimp [26], and the CompreX detecting anomalies in heterogeneous datasets representing a mixture algorithm [2]. Broadly speaking, both take a similar approach: of different sources, such as security scenarios in which system first, a model H of the data that compresses it well is found using a and user processes have distinct behavior patterns. We propose a heuristic search, balancing the model complexity L(H) (number of meta-algorithm for enhancing any MDL-based anomaly detection bits required to compress the model structure/parameters) against model to deal with heterogeneous data by fitting a mixture model the data complexity L(X H) (number of bits required to compress to the data, via a variant of k-means clustering. Our experimental the data given the model). Once such a model H is found, we assign results show that using a discrete mixture model provides competitive to each object x X a score corresponding to the object's performance relative to two previous anomaly detection compressed size L(x H) given the selected model. Intuitively, if the algorithms, while mixtures of more sophisticated models yield further model accurately characterizes the data as a whole, records that are gains, on both synthetic datasets and realistic datasets from a representative will compress well, yielding a low anomaly score, security scenario.
arXiv.org Artificial Intelligence
Jun-14-2020
- Country:
- North America > United States (0.46)
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Information Technology > Security & Privacy (0.93)