A common need when you are analyzing real-world data-sets is determining which data point stand out as being different to all others data points. Such data points are known as anomalies. This article was originally published on Medium by Davis David. In this article, you will learn a couple of Machine Learning-Based Approaches for Anomaly Detection and then show how to apply one of these approaches to solve a specific use case for anomaly detection (Credit Fraud detection) in part two. A common need when you analyzing real-world data-sets is determining which data point stand out as being different to all others data points.
Modern software applications are often comprised of distributed microservices. Consider typical Software as a Service (SaaS) applications, which are accessed through web interfaces and run on the cloud. In part due to their physically distributed nature, managing and monitoring performance in these complex systems is becoming increasingly difficult. When issues such as performance degradations arise, it can be challenging to identify and debug the root causes. At Ericsson's Global AI Accelerator, we're exploring data-science based monitoring solutions that can learn to identify and categorize anomalous system behavior, and thereby improve incident resolution times.
Anomaly Detection is the identification of rare occurrences, items, or events of concern due to their differing characteristics from majority of the processed data. Anomalies, or outliers as they are also called, can represent security errors, structural defects, and even bank fraud or medical problems. There are three main forms of anomaly detection. The first type of anomaly detection is unsupervised anomaly detection. This technique detects anomalies in an unlabeled data set by comparing data points to each other, establishing a baseline "normal" outline for the data, and looking for differences between the points.
During the past decade, many anomaly detection approaches have been introduced in different fields such as network monitoring, fraud detection, and intrusion detection. However, they require understanding of data pattern and often need a long off-line period to build a model or network for the target data. Providing real-time and proactive anomaly detection for streaming time series without human intervention and domain knowledge is highly valuable since it greatly reduces human effort and enables appropriate countermeasures to be undertaken before a disastrous damage, failure, or other harmful event occurs. However, this issue has not been well studied yet. To address it, this paper proposes RePAD, which is a Real-time Proactive Anomaly Detection algorithm for streaming time series based on Long Short-Term Memory (LSTM). RePAD utilizes short-term historic data points to predict and determine whether or not the upcoming data point is a sign that an anomaly is likely to happen in the near future. By dynamically adjusting the detection threshold over time, RePAD is able to tolerate minor pattern change in time series and detect anomalies either proactively or on time. Experiments based on two time series datasets collected from the Numenta Anomaly Benchmark demonstrate that RePAD is able to proactively detect anomalies and provide early warnings in real time without human intervention and domain knowledge.
Anomaly detection can be termed as a technique, which is deployed to identify various unusual patterns, which are not in collation with the expected behavior of the data. These unnatural occurrences are also termed as outliners. The application of Anomaly detection starts with the involvement of the business intrusion aspect in business, where it identifies unnatural patterns within the network traffic, which can eventually signal a system hack. Another field where Anomaly detection is deployed is the health monitoring which is based on a system. It can help with the function of detecting a malignant tumor through an MRI scan.