DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN

arXiv.org Machine Learning

Recently, the introduction of the generative adversarial network (GAN) and its variants has enabled the generation of realistic synthetic samples, which has been used for enlarging training sets. Previous work primarily focused on data augmentation for semi-supervised and supervised tasks. In this paper, we instead focus on unsupervised anomaly detection and propose a novel generative data augmentation framework optimized for this task. In particular, we propose to oversample infrequent normal samples - normal samples that occur with small probability, e.g., rare normal events. We show that these samples are responsible for false positives in anomaly detection. However, oversampling of infrequent normal samples is challenging for real-world high-dimensional data with multimodal distributions. To address this challenge, we propose to use a GAN variant known as the adversarial autoencoder (AAE) to transform the high-dimensional multimodal data distributions into low-dimensional unimodal latent distributions with well-defined tail probability. Then, we systematically oversample at the `edge' of the latent distributions to increase the density of infrequent normal samples. We show that our oversampling pipeline is a unified one: it is generally applicable to datasets with different complex data distributions. To the best of our knowledge, our method is the first data augmentation technique focused on improving performance in unsupervised anomaly detection. We validate our method by demonstrating consistent improvements across several real-world datasets.


Domain Adaptation for Vehicle Detection from Bird's Eye View LiDAR Point Cloud Data

arXiv.org Artificial Intelligence

Point cloud data from 3D LiDAR sensors are one of the most crucial sensor modalities for versatile safety-critical applications such as self-driving vehicles. Since the annotations of point cloud data is an expensive and time-consuming process, therefore recently the utilisation of simulated environments and 3D LiDAR sensors for this task started to get some popularity. With simulated sensors and environments, the process for obtaining an annotated synthetic point cloud data became much easier. However, the generated synthetic point cloud data are still missing the artefacts usually exist in point cloud data from real 3D LiDAR sensors. As a result, the performance of the trained models on this data for perception tasks when tested on real point cloud data is degraded due to the domain shift between simulated and real environments. Thus, in this work, we are proposing a domain adaptation framework for bridging this gap between synthetic and real point cloud data. Our proposed framework is based on the deep cycle-consistent generative adversarial networks (CycleGAN) architecture. We have evaluated the performance of our proposed framework on the task of vehicle detection from a bird's eye view (BEV) point cloud images coming from real 3D LiDAR sensors. The framework has shown competitive results with an improvement of more than 7% in average precision score over other baseline approaches when tested on real BEV point cloud images.


GAN-based method for cyber-intrusion detection

arXiv.org Machine Learning

Ubiquitous cyber-intrusions endanger the security of our devices constantly. They may bring irreversible damages to the system and cause leakage of privacy. Thus, it is of vital importance to promptly detect these intrusions. Traditional methods such as Decision Trees and Support Vector Machine (SVM) are used to classify normal internet connections and cyber-intrusions. However, the intrusions are largely fewer than normal connections, which limits the capability of these methods. Anomaly detection methods such as Isolation Forest can handle the imbalanced data. Nevertheless, when the features of data increase, these methods lack enough ability to learn the distribution. Generative adversarial network (GAN) has been proposed to solve the above issues. With its strong generative ability, it only needs to learn the distribution of normal status, and identify the abnormal status when intrusion occurs. But existing models are not suitable to process discrete values, leading to immense degradation of detection performance. To cope with these challenges, in this paper, we propose a novel GAN-based model with specifically-designed loss function to detect cyber-intrusions. Experiment results show that our model outperforms state-of-the-art models and remarkably reduce the overhead.


One-to-one Mapping for Unpaired Image-to-image Translation

arXiv.org Artificial Intelligence

Recently image-to-image translation has attracted significant interests in the literature, starting from the successful use of the generative adversarial network (GAN), to the introduction of cyclic constraint, to extensions to multiple domains. However, in existing approaches, there is no guarantee that the mapping between two image domains is unique or one-to-one. Here we propose a self-inverse network learning approach for unpaired image-to-image translation. Building on top of CycleGAN, we learn a self-inverse function by simply augmenting the training samples by swapping inputs and outputs during training and with separated cycle consistency loss for each mapping direction. The outcome of such learning is a proven one-to-one mapping function. Our extensive experiments on a variety of datasets, including cross-modal medical image synthesis, object transfiguration, and semantic labeling, consistently demonstrate clear improvement over the CycleGAN method both qualitatively and quantitatively. Especially our proposed method reaches the state-of-the-art result on the cityscapes benchmark dataset for the label to photo unpaired directional image translation.


Learning Useful System Call Attributes for Anomaly Detection

AAAI Conferences

Current sequence is then examined (using the model) for anomalous behavior, which could correspond to attacks. Though these techniques have been shown to be quite effective, a key element seems to be missing - the inclusion and utilization of the system call arguments. Recent research shows that sequence-based systems are prone to evasion. We propose an idea of learning different representations for system call arguments. Results indicate that this information can be effectively used for detecting more attacks with reasonable space and time overhead.