Statistical Anomaly Detection for Train Fleets

AI Magazine

The tool is currently used by several railway operators across the world to inspect and visualize the occurrence of "event messages" generated on the trains. The anomaly detection component helps the operators quickly to find significant deviations from normal behavior and to detect early indications for possible problems. The method used is based on Bayesian principal anomaly, which is a framework for parametric anomaly detection using Bayesian statistics. The savings in maintenance costs of using the tool comes mainly from avoiding costly breakdowns and have been estimated to be several million Euros per year for the tool. In the long run, it is expected that maintenance costs can be reduced by between 5 and 10 percent with the help of the tool. It has been used for fraud detection and intrusion detection for a long time, but in later years the usage has exploded to all kind of domains, like surveillance, industrial system monitoring, epidemiology, and so on. For an overview of different anomaly-detection methods and applications, see, for example, Chandola, Banerjee, and Kumar (2009). The approach taken in statistical anomaly detection is to use data from (predominantly normal) previous situations to build a statistical model of what is normal. New situations are compared against that model and are considered anomalous if they are too improbable to occur in that model. The Swedish Institute of Computer Science (SICS) has for several years developed methods for statistical anomaly detection based on a framework called Bayesian principal anomaly (Holst and Ekman 2011).


Statistical Anomaly Detection for Train Fleets

AI Magazine

We have developed a method for statistical anomaly detection which has been deployed in a tool for condition monitoring of train fleets. The tool is currently used by several railway operators over the world to inspect and visualize the occurrence of event messages generated on the trains. The anomaly detection component helps the operators to quickly find significant deviations from normal behavior and to detect early indications for possible problems. The savings in maintenance costs comes mainly from avoiding costly breakdowns, and have been estimated to several million Euros per year for the tool. In the long run, it is expected that maintenance costs can be reduced with between 5 and 10 % by using the tool.


pyISC: A Bayesian Anomaly Detection Framework for Python

AAAI Conferences

The pyISC is a Python API and extension to the C++ based Incremental Stream Clustering (ISC) anomaly detection and classification framework. The framework is based on parametric Bayesian statistical inference using the Bayesian Principal Anomaly (BPA), which enables to combine the output from several probability distributions. pyISC is designed to be easy to use and integrated with other Python libraries, specifically those used for data science. In this paper, we show how to use the framework and we also compare its performance to other well-known methods on 22 real-world datasets. The simulation results show that the performance of pyISC is comparable to the other methods. pyISC is part of the Stream toolbox developed within the STREAM project.


Should I Raise The Red Flag? A comprehensive survey of anomaly scoring methods toward mitigating false alarms

arXiv.org Machine Learning

A general Intrusion Detection System (IDS) fundamentally acts based on an Anomaly Detection System (ADS) or a combination of anomaly detection and signature-based methods, gathering and analyzing observations and reporting possible suspicious cases to a system administrator or the other users for further investigation. One of the notorious challenges which even the state-of-the-art ADS and IDS have not overcome is the possibility of a very high false alarms rate. Especially in very large and complex system settings, the amount of low-level alarms easily overwhelms administrators and increases their tendency to ignore alerts. We can group the existing false alarm mitigation strategies into two main families: The first group covers the methods directly customized and applied toward higher quality anomaly scoring in ADS. The second group includes approaches utilized in the related contexts as a filtering method toward decreasing the possibility of false alarm rates. Given the lack of a comprehensive study regarding possible ways to mitigate the false alarm rates, in this paper, we review the existing techniques for false alarm mitigation in ADS and present the pros and cons of each technique. We also study a few promising techniques applied in the signature-based IDS and other related contexts like commercial Security Information and Event Management (SIEM) tools, which are applicable and promising in the ADS context. Finally, we conclude with some directions for future research.


Online Multivariate Anomaly Detection and Localization for High-dimensional Settings

arXiv.org Machine Learning

This paper considers the real-time detection of anomalies in high-dimensional systems. The goal is to detect anomalies quickly and accurately so that the appropriate countermeasures could be taken in time, before the system possibly gets harmed. We propose a sequential and multivariate anomaly detection method that scales well to high-dimensional datasets. The proposed method follows a nonparametric, i.e., data-driven, and semi-supervised approach, i.e., trains only on nominal data. Thus, it is applicable to a wide range of applications and data types. Thanks to its multivariate nature, it can quickly and accurately detect challenging anomalies, such as changes in the correlation structure and stealth low-rate cyberattacks. Its asymptotic optimality and computational complexity are comprehensively analyzed. In conjunction with the detection method, an effective technique for localizing the anomalous data dimensions is also proposed. We further extend the proposed detection and localization methods to a supervised setup where an additional anomaly dataset is available, and combine the proposed semi-supervised and supervised algorithms to obtain an online learning algorithm under the semi-supervised framework. The practical use of proposed algorithms are demonstrated in DDoS attack mitigation, and their performances are evaluated using a real IoT-botnet dataset and simulations.