Waterborne diseases affect more than 2 billion people worldwide, causing substantial economic burden. For example, the treatment of waterborne diseases costs more than $2 billion annually in the United States alone, with 90 million cases recorded per year. Among waterborne pathogen-related problems, one of the most common public health concerns is the presence of total coliform bacteria and Escherichia coli (E. Traditional culture-based bacteria detection methods often take 24-48 hours, followed by visual inspection and colony counting by an expert, according to the United States Environmental Protection Agency (EPA) guidelines. Alternatively, molecular detection methods based on, for example, the amplification of nucleic acids, can reduce the detection time to a few hours, but they generally lack the sensitivity for detecting bacteria at very low concentrations, and are not capable of differentiating between live and dead microorganisms.
Anomaly Detection is a machine learning technique used for identifying rare items, events, or observations by checking for rows in the table that differ significantly from the majority of the rows. Typically, the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problem or error. Some common business use cases for anomaly detection are: Fraud detection (credit cards, insurance, etc.) using financial data.
Learning classifiers for misuse and anomaly detection using a bag of system calls representation. Anomaly detection in health data based on deep learning. Abnormal human activity recognition using SVM based approach. Anomaly detection of gas turbines based on normal pattern extraction. Contextual anomaly detection for a critical industrial system based on logs and metrics.
Today during its annual IBM Think conference, IBM announced the launch of Watson AIOps, a service that taps AI to automate the real-time detection, diagnosing, and remediation of network anomalies. It also unveiled new offerings targeting the rollout of 5G technologies and the devices on those networks, as well as a coalition of telecommunications partners -- the IBM Telco Network Cloud Ecosystem -- that will work with IBM to deploy edge computing technologies. Watson AIOps marks IBM's foray into the mammoth AIOps market, which is expected to grow from $2.55 billion in 2018 to $11.02 billion by 2023, according to Markets and Markets. That might be a conservative projection in light of the pandemic, which is forcing IT teams to increasingly conduct their work remotely. In lieu of access to infrastructure, tools like Watson AIOps could help prevent major outages, the cost of which a study from Aberdeen pegged at $260,000 per hour.
An Anomaly is by definition something that is outside the norm or what is expected. For data this can mean rare individual outliers or distinct clusters. Anomaly detection is an important capability with broad applicability in many domains such as medical diagnostics or in detection of intrusions, fraud, or false information. All three categories of model training are used for anomalous data; supervised, semi-supervised, and unsupervised. Typically the first go-to methods are statistical and classical machine learning techniques.
Anomaly detection is a prominent data preprocessing step in learning applications for correction and/or removal of faulty data. Automating this data type with the use of autoencoders could increase the quality of the dataset by isolating anomalies that were missed through manual or basic statistical analysis. A Simple, Deep, and Supervised Deep Autoencoder were trained and compared for anomaly detection over the ASHRAE building energy dataset. Given the restricted parameters under which the models were trained, the Deep Autoencoder perfoms the best, however, the Supervised Deep Autoencoder outperforms the other models in total anomalies detected when considerations for the test datasets are given.
Anomaly detection for time-series data has been an important research field for a long time. Seminal work on anomaly detection methods has been focussing on statistical approaches. In recent years an increasing number of machine learning algorithms have been developed to detect anomalies on time-series. Subsequently, researchers tried to improve these techniques using (deep) neural networks. In the light of the increasing number of anomaly detection methods, the body of research lacks a broad comparative evaluation of statistical, machine learning and deep learning methods. This paper studies 20 univariate anomaly detection methods from the all three categories. The evaluation is conducted on publicly available datasets, which serve as benchmarks for time-series anomaly detection. By analyzing the accuracy of each method as well as the computation time of the algorithms, we provide a thorough insight about the performance of these anomaly detection approaches, alongside some general notion of which method is suited for a certain type of data.
We present a novel model called One Class Minimum Spanning Tree (OCmst) for novelty detection problem that uses a Convolutional Neural Network (CNN) as deep feature extractor and graph-based model based on Minimum Spanning Tree (MST). In a novelty detection scenario, the training data is no polluted by outliers (abnormal class) and the goal is to recognize if a test instance belongs to the normal class or to the abnormal class. Our approach uses the deep features from CNN to feed a pair of MSTs built starting from each test instance. To cut down the computational time we use a parameter $\gamma$ to specify the size of the MST's starting to the neighbours from the test instance. To prove the effectiveness of the proposed approach we conducted experiments on two publicly available datasets, well-known in literature and we achieved the state-of-the-art results on CIFAR10 dataset.
In terms of Generative Adversarial Networks (GANs), the information metric to discriminate the generative data and the real data, lies in the key point of generation efficiency, which plays an important role in GAN-based applications, especially in anomaly detection. As for the original GAN, the information metric based on Kullback-Leibler (KL) divergence has limitations on rare events generation and training performance for adversarial networks. Therefore, it is significant to investigate the metrics used in GANs to improve the generation ability as well as bring gains in the training process. In this paper, we adopt the exponential form, referred from the Message Importance Measure (MIM), to replace the logarithm form of the original GAN. This approach named MIM-based GAN, has dominant performance on training process and rare events generation. Specifically, we first discuss the characteristics of training process in this approach. Moreover, we also analyze its advantages on generating rare events in theory. In addition, we do simulations on the datasets of MNIST and ODDS to see that the MIM-based GAN achieves state-of-the-art performance on anomaly detection compared with some classical GANs.
Many time series data mining problems can be solved with repeated use of distance measure. Examples of such tasks include similarity search, clustering, classification, anomaly detection and segmentation. For over two decades it has been known that the Dynamic Time Warping (DTW) distance measure is the best measure to use for most tasks, in most domains. Because the classic DTW algorithm has quadratic time complexity, many ideas have been introduced to reduce its amortized time, or to quickly approximate it. One of the most cited approximate approaches is FastDTW. The FastDTW algorithm has well over a thousand citations and has been explicitly used in several hundred research efforts. In this work, we make a surprising claim. In any realistic data mining application, the approximate FastDTW is much slower than the exact DTW. This fact clearly has implications for the community that uses this algorithm: allowing it to address much larger datasets, get exact results, and do so in less time. Our observation also has a more sobering lesson for the community. This work may serve as a reminder to the community to exercise more caution in uncritically accepting published results.