Statistical Anomaly Detection for Train Fleets

AI Magazine

We have developed a method for statistical anomaly detection which has been deployed in a tool for condition monitoring of train fleets. The tool is currently used by several railway operators over the world to inspect and visualize the occurrence of event messages generated on the trains. The anomaly detection component helps the operators to quickly find significant deviations from normal behavior and to detect early indications for possible problems. The savings in maintenance costs comes mainly from avoiding costly breakdowns, and have been estimated to several million Euros per year for the tool. In the long run, it is expected that maintenance costs can be reduced with between 5 and 10 % by using the tool.


Statistical Anomaly Detection for Train Fleets

AAAI Conferences

We have developed a method for statistical anomaly detection which has been deployed in a tool for condition monitoring of train fleets. The tool is currently used by several railway operators over the world to inspect and visualize the occurrence of event messages generated on the trains. The anomaly detection component helps the operators to quickly find significant deviations from normal behavior and to detect early indications for possible problems. The savings in maintenance costs comes mainly from avoiding costly breakdowns, and have been estimated to several million Euros per year for the tool. In the long run, it is expected that maintenance costs can be reduced with between 5 and 10 % by using the tool.


Can We Achieve Open Category Detection with Guarantees?

AAAI Conferences

Open category detection is the problem of detecting "alien" test instances that belong to categories/classes that were not present in the training data. In many applications, reliably detecting such aliens is central to ensuring safety and/or quality of test data analysis. Unfortunately, to the best of our knowledge, there are no algorithms that provide theoretical guarantees on their ability to detect aliens under general assumptions. Further, while there are algorithms for open category detection, there are few empirical results that directly report alien-detection rates. Thus, there are  significant theoretical and empirical gaps in our understanding of open category detection. In this paper, we take a step toward addressing this gap by studying a simplified, but practically relevant, variant of open category detection. In our setting, we are provided with a "clean" training set that contains only the target categories of interest. However, at test time, some fraction \alpha of the test examples are aliens. Under the assumption that we know an upper bound on \alpha, we develop an algorithm with PAC-style guarantees on the alien detection rate, while aiming to minimize false alarms. Our empirical results on synthetic and benchmark datasets demonstrate the regimes in which the algorithm can be effective and provide a baseline for further advancements.


Incident Detection and Investigation - How Math...

#artificialintelligence

If you move from rules and heuristics to automated anomaly detection and machine learning, there is no question that you are going to see outliers and risky behaviors that you previously did not. Your rules were most likely aimed at identifying patterns that your team somehow knows indicate malicious activity and anomaly detection tools should not be restricted by the knowledge of your team. However, not involving the knowledge of your team means that a great deal of outliers identified will be legitimate to your organization, so instead of having to sift through thousands of false positives that broke a yes/no rule, you will have thousands of false positives on a risk scale from low to high.


Real-Time Nonparametric Anomaly Detection in High-Dimensional Settings

arXiv.org Machine Learning

Timely and reliable detection of abrupt anomalies, e.g., faults, intrusions/attacks, is crucial for real-time monitoring and security of many modern systems such as the smart grid and the Internet of Things (IoT) networks that produce high-dimensional data. With this goal, we propose effective and scalable algorithms for real-time anomaly detection in high-dimensional settings. Our proposed algorithms are nonparametric (model-free) as both the nominal and anomalous multivariate data distributions are assumed to be unknown. We extract useful univariate summary statistics and perform the anomaly detection task in a single-dimensional space. We model anomalies as persistent outliers and propose to detect them via a cumulative sum (CUSUM)-like algorithm. In case the observed data stream has a low intrinsic dimensionality, we find a low-dimensional submanifold in which the nominal data are embedded and then evaluate whether the sequentially acquired data persistently deviate from the nominal submanifold. Further, in the general case, we determine an acceptance region for nominal data via the Geometric Entropy Minimization (GEM) method and then evaluate whether the sequentially observed data persistently fall outside the acceptance region. We provide an asymptotic lower bound on the average false alarm period of the proposed CUSUM-like algorithm. Moreover, we provide a sufficient condition to asymptotically guarantee that the decision statistic of the proposed algorithm does not diverge in the absence of anomalies. Numerical studies illustrate the effectiveness of the proposed schemes in quick and accurate detection of changes/anomalies in a variety of high-dimensional settings.