Deep Variational Semi-Supervised Novelty Detection
Daniel, Tal, Kurutach, Thanard, Tamar, Aviv
A BSTRACT In anomaly detection (AD), one seeks to identify whether a test sample is abnormal, given a data set of normal samples. A recent and promising approach to AD relies on deep generative models, such as variational autoencoders (V AEs), for unsupervised learning of the normal data distribution. In semi-supervised AD (SSAD), the data also includes a small sample of labeled anomalies. In this work, we propose two variational methods for training V AEs for SSAD. The intuitive idea in both methods is to train the encoder to'separate' between latent vectors for normal and outlier data. We show that this idea can be derived from principled probabilistic formulations of the problem, and propose simple and effective algorithms. Our methods can be applied to various data types, as we demonstrate on SSAD datasets ranging from natural images to astronomy and medicine, and can be combined with any V AE model architecture. When comparing to state-of-the-art SSAD methods that are not specific to particular data types, we obtain marked improvement in outlier detection. In its common formulation, training data is provided only for normal samples, while at test time, anomalous samples need to be detected. In the probabilistic AD approach, a model of the normal data distribution is learned, and the likelihood of a test sample under this model is thresholded for classification as normal or not. Recently, deep generative models such as variational autoencoders (V AEs, Kingma & Welling 2013) and generative adversarial networks (Goodfellow et al., 2014) have shown promise for learning data distributions in AD (An & Cho, 2015; Suh et al., 2016; Schlegl et al., 2017; Wang et al., 2017). Here, we consider the setting of semi-supervised AD (SSAD), where in addition to the normal samples, a small sample of labeled anomalies is provided (G ornitz et al., 2013). Most importantly, this set is too small to represent the range of possible anomalies, making classification methods (either supervised or semi-supervised) unsuitable. Instead, most approaches are based on'fixing' an unsupervised AD method to correctly classify the labeled anomalies, while still maintaining AD capabilities for unseen outliers (e.g., G ornitz et al., 2013; Mu noz-Mar ı et al., 2010; Ruff et al., 2019).
Nov-12-2019
- Country:
- Asia > South Korea > Gyeonggi-do > Suwon (0.04)
- Genre:
- Research Report > New Finding (0.46)
- Technology: