Statistical Anomaly Detection for Train Fleets

AI Magazine

The tool is currently used by several railway operators across the world to inspect and visualize the occurrence of "event messages" generated on the trains. The anomaly detection component helps the operators quickly to find significant deviations from normal behavior and to detect early indications for possible problems. The method used is based on Bayesian principal anomaly, which is a framework for parametric anomaly detection using Bayesian statistics. The savings in maintenance costs of using the tool comes mainly from avoiding costly breakdowns and have been estimated to be several million Euros per year for the tool. In the long run, it is expected that maintenance costs can be reduced by between 5 and 10 percent with the help of the tool. It has been used for fraud detection and intrusion detection for a long time, but in later years the usage has exploded to all kind of domains, like surveillance, industrial system monitoring, epidemiology, and so on. For an overview of different anomaly-detection methods and applications, see, for example, Chandola, Banerjee, and Kumar (2009). The approach taken in statistical anomaly detection is to use data from (predominantly normal) previous situations to build a statistical model of what is normal. New situations are compared against that model and are considered anomalous if they are too improbable to occur in that model. The Swedish Institute of Computer Science (SICS) has for several years developed methods for statistical anomaly detection based on a framework called Bayesian principal anomaly (Holst and Ekman 2011).


Statistical Anomaly Detection for Train Fleets

AAAI Conferences

We have developed a method for statistical anomaly detection which has been deployed in a tool for condition monitoring of train fleets. The tool is currently used by several railway operators over the world to inspect and visualize the occurrence of event messages generated on the trains. The anomaly detection component helps the operators to quickly find significant deviations from normal behavior and to detect early indications for possible problems. The savings in maintenance costs comes mainly from avoiding costly breakdowns, and have been estimated to several million Euros per year for the tool. In the long run, it is expected that maintenance costs can be reduced with between 5 and 10 % by using the tool.


pyISC: A Bayesian Anomaly Detection Framework for Python

AAAI Conferences

The pyISC is a Python API and extension to the C++ based Incremental Stream Clustering (ISC) anomaly detection and classification framework. The framework is based on parametric Bayesian statistical inference using the Bayesian Principal Anomaly (BPA), which enables to combine the output from several probability distributions. pyISC is designed to be easy to use and integrated with other Python libraries, specifically those used for data science. In this paper, we show how to use the framework and we also compare its performance to other well-known methods on 22 real-world datasets. The simulation results show that the performance of pyISC is comparable to the other methods. pyISC is part of the Stream toolbox developed within the STREAM project.


Experimental Comparison of Online Anomaly Detection Algorithms

AAAI Conferences

Anomaly detection methods abound and are used extensively in streaming settings in a wide variety of domains. But a strength can also be a weakness; given the vast number of methods, how can one select the best method for their application? Unfortunately, there is no one best way for all domains. Existing literature is focused on creating new anomaly detection methods or creating large frameworks for experimenting with multiple methods at the same time. As the literature continues to grow, extensive evaluation of every available anomaly detection method is not feasible. To reduce this evaluation burden, in this paper we present a framework to intelligently choose the optimal anomaly detection methods based on the characteristics the time series displays. We provide a comprehensive experimental validation of multiple anomaly detection methods over different time series characteristics to form guidelines. Applying our framework can save time and effort by surfacing the most promising anomaly detection methods instead of experimenting extensively with a rapidly expanding library of anomaly detection methods.


Should I Raise The Red Flag? A comprehensive survey of anomaly scoring methods toward mitigating false alarms

arXiv.org Machine Learning

A general Intrusion Detection System (IDS) fundamentally acts based on an Anomaly Detection System (ADS) or a combination of anomaly detection and signature-based methods, gathering and analyzing observations and reporting possible suspicious cases to a system administrator or the other users for further investigation. One of the notorious challenges which even the state-of-the-art ADS and IDS have not overcome is the possibility of a very high false alarms rate. Especially in very large and complex system settings, the amount of low-level alarms easily overwhelms administrators and increases their tendency to ignore alerts. We can group the existing false alarm mitigation strategies into two main families: The first group covers the methods directly customized and applied toward higher quality anomaly scoring in ADS. The second group includes approaches utilized in the related contexts as a filtering method toward decreasing the possibility of false alarm rates. Given the lack of a comprehensive study regarding possible ways to mitigate the false alarm rates, in this paper, we review the existing techniques for false alarm mitigation in ADS and present the pros and cons of each technique. We also study a few promising techniques applied in the signature-based IDS and other related contexts like commercial Security Information and Event Management (SIEM) tools, which are applicable and promising in the ADS context. Finally, we conclude with some directions for future research.