pyISC: A Bayesian Anomaly Detection Framework for Python

AAAI Conferences

The pyISC is a Python API and extension to the C++ based Incremental Stream Clustering (ISC) anomaly detection and classification framework. The framework is based on parametric Bayesian statistical inference using the Bayesian Principal Anomaly (BPA), which enables to combine the output from several probability distributions. pyISC is designed to be easy to use and integrated with other Python libraries, specifically those used for data science. In this paper, we show how to use the framework and we also compare its performance to other well-known methods on 22 real-world datasets. The simulation results show that the performance of pyISC is comparable to the other methods. pyISC is part of the Stream toolbox developed within the STREAM project.


An Adaptive Approach for Anomaly Detector Selection and Fine-Tuning in Time Series

arXiv.org Machine Learning

The anomaly detection of time series is a hotspot of time series data mining. The own characteristics of different anomaly detectors determine the abnormal data that they are good at. There is no detector can be optimizing in all types of anomalies. Moreover, it still has difficulties in industrial production due to problems such as a single detector can't be optimized at different time windows of the same time series. This paper proposes an adaptive model based on time series characteristics and selecting appropriate detector and run-time parameters for anomaly detection, which is called ATSDLN(Adaptive Time Series Detector Learning Network). We take the time series as the input of the model, and learn the time series representation through FCN. In order to realize the adaptive selection of detectors and run-time parameters according to the input time series, the outputs of FCN are the inputs of two sub-networks: the detector selection network and the run-time parameters selection network. In addition, the way that the variable layer width design of the parameter selection sub-network and the introduction of transfer learning make the model be with more expandability. Through experiments, it is found that ATSDLN can select appropriate anomaly detector and run-time parameters, and have strong expandability, which can quickly transfer. We investigate the performance of ATSDLN in public data sets, our methods outperform other methods in most cases with higher effect and better adaptation. We also show experimental results on public data sets to demonstrate how model structure and transfer learning affect the effectiveness.


Statistical Anomaly Detection for Train Fleets

AAAI Conferences

We have developed a method for statistical anomaly detection which has been deployed in a tool for condition monitoring of train fleets. The tool is currently used by several railway operators over the world to inspect and visualize the occurrence of event messages generated on the trains. The anomaly detection component helps the operators to quickly find significant deviations from normal behavior and to detect early indications for possible problems. The savings in maintenance costs comes mainly from avoiding costly breakdowns, and have been estimated to several million Euros per year for the tool. In the long run, it is expected that maintenance costs can be reduced with between 5 and 10 % by using the tool.


Sequential Feature Explanations for Anomaly Detection

arXiv.org Machine Learning

In many applications, an anomaly detection system presents the most anomalous data instance to a human analyst, who then must determine whether the instance is truly of interest (e.g. a threat in a security setting). Unfortunately, most anomaly detectors provide no explanation about why an instance was considered anomalous, leaving the analyst with no guidance about where to begin the investigation. To address this issue, we study the problems of computing and evaluating sequential feature explanations (SFEs) for anomaly detectors. An SFE of an anomaly is a sequence of features, which are presented to the analyst one at a time (in order) until the information contained in the highlighted features is enough for the analyst to make a confident judgement about the anomaly. Since analyst effort is related to the amount of information that they consider in an investigation, an explanation's quality is related to the number of features that must be revealed to attain confidence. One of our main contributions is to present a novel framework for large scale quantitative evaluations of SFEs, where the quality measure is based on analyst effort. To do this we construct anomaly detection benchmarks from real data sets along with artificial experts that can be simulated for evaluation. Our second contribution is to evaluate several novel explanation approaches within the framework and on traditional anomaly detection benchmarks, offering several insights into the approaches.


Statistical Anomaly Detection for Train Fleets

AI Magazine

The tool is currently used by several railway operators across the world to inspect and visualize the occurrence of "event messages" generated on the trains. The anomaly detection component helps the operators quickly to find significant deviations from normal behavior and to detect early indications for possible problems. The method used is based on Bayesian principal anomaly, which is a framework for parametric anomaly detection using Bayesian statistics. The savings in maintenance costs of using the tool comes mainly from avoiding costly breakdowns and have been estimated to be several million Euros per year for the tool. In the long run, it is expected that maintenance costs can be reduced by between 5 and 10 percent with the help of the tool. It has been used for fraud detection and intrusion detection for a long time, but in later years the usage has exploded to all kind of domains, like surveillance, industrial system monitoring, epidemiology, and so on. For an overview of different anomaly-detection methods and applications, see, for example, Chandola, Banerjee, and Kumar (2009). The approach taken in statistical anomaly detection is to use data from (predominantly normal) previous situations to build a statistical model of what is normal. New situations are compared against that model and are considered anomalous if they are too improbable to occur in that model. The Swedish Institute of Computer Science (SICS) has for several years developed methods for statistical anomaly detection based on a framework called Bayesian principal anomaly (Holst and Ekman 2011).