AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the presence of seasonality and an underlying trend. The AnomalyDetection package can be used in wide variety of contexts. The underlying algorithm – referred to as Seasonal Hybrid ESD (S-H-ESD) builds upon the Generalized ESD test for detecting anomalies. Note that S-H-ESD can be used to detect both global as well as local anomalies. This is achieved by employing time series decomposition and using robust statistical metrics, viz., median together with ESD.
This paper proposes a novel optimization principle and its implementation for unsupervised anomaly detection in sound (ADS) using an autoencoder (AE). The goal of unsupervised-ADS is to detect unknown anomalous sound without training data of anomalous sound. Use of an AE as a normal model is a state-of-the-art technique for unsupervised-ADS. To decrease the false positive rate (FPR), the AE is trained to minimize the reconstruction error of normal sounds and the anomaly score is calculated as the reconstruction error of the observed sound. Unfortunately, since this training procedure does not take into account the anomaly score for anomalous sounds, the true positive rate (TPR) does not necessarily increase. In this study, we define an objective function based on the Neyman-Pearson lemma by considering ADS as a statistical hypothesis test. The proposed objective function trains the AE to maximize the TPR under an arbitrary low FPR condition. To calculate the TPR in the objective function, we consider that the set of anomalous sounds is the complementary set of normal sounds and simulate anomalous sounds by using a rejection sampling algorithm. Through experiments using synthetic data, we found that the proposed method improved the performance measures of ADS under low FPR conditions. In addition, we confirmed that the proposed method could detect anomalous sounds in real environments.
Competitive learning is an unsupervised algorithm that classifies input patterns intomutually exclusive clusters. In a neural net framework, each cluster is represented by a processing unit that competes with others in a winnertake-all poolfor an input pattern. I present a simple extension to the algorithm that allows it to construct discrete, distributed representations. Discrete representations are useful because they are relatively easy to analyze and their information content can readily be measured. Distributed representations areuseful because they explicitly encode similarity. The basic idea is to apply competitive learning iteratively to an input pattern, and after each stage to subtract from the input pattern the component that was captured in the representation at that stage. This component is simply the weight vector of the winning unit of the competitive pool. The subtraction procedure forces competitive pools at different stages to encode different aspects of the input. The algorithm is essentially the same as a traditional data compression technique knownas multistep vector quantization, although the neural net perspective suggestspotentially powerful extensions to that approach.
An Autoencoder is neural network capable of unsupervised feature learning. Neural networks are typically used for supervised learning problems, trying to predict a target vector y from input vectors x. An Autoencoder network, however, tries to predict x from x, without the need for labels. Here the challenge is recreating the essence of the original input from compressed, noisy or corrupted data. The idea behind the Autoencoder is to build a network with a narrow hidden layer between Encoder and Decoder that serves as a compressed representation of the input data.
Nonnegative matrix factorization (NMF) has an established reputation as a useful data analysis technique in numerous applications. However, its usage in practical situations is undergoing challenges in recent years. The fundamental factor to this is the increasingly growing size of the datasets available and needed in the information sciences. To address this, in this work we propose to use structured random compression, that is, random projections that exploit the data structure, for two NMF variants: classical and separable. In separable NMF (SNMF) the left factors are a subset of the columns of the input matrix. We present suitable formulations for each problem, dealing with different representative algorithms within each one. We show that the resulting compressed techniques are faster than their uncompressed variants, vastly reduce memory demands, and do not encompass any significant deterioration in performance. The proposed structured random projections for SNMF allow to deal with arbitrarily shaped large matrices, beyond the standard limit of tall-and-skinny matrices, granting access to very efficient computations in this general setting. We accompany the algorithmic presentation with theoretical foundations and numerous and diverse examples, showing the suitability of the proposed approaches.