to

The Effect of Optimization Methods on the Robustness of Out-of-Distribution Detection Approaches

Deep neural networks (DNNs) have become the de facto learning mechanism in different domains. Their tendency to perform unreliably on out-of-distribution (OOD) inputs hinders their adoption in critical domains. Several approaches have been proposed for detecting OOD inputs. However, existing approaches still lack robustness. In this paper, we shed light on the robustness of OOD detection (OODD) approaches by revealing the important role of optimization methods. We show that OODD approaches are sensitive to the type of optimization method used during training deep models. Optimization methods can provide different solutions to a non-convex problem and so these solutions may or may not satisfy the assumptions (e.g., distributions of deep features) made by OODD approaches. Furthermore, we propose a robustness score that takes into account the role of optimization methods. This provides a sound way to compare OODD approaches. In addition to comparing several OODD approaches using our proposed robustness score, we demonstrate that some optimization methods provide better solutions for OODD approaches.

An Anomaly Contribution Explainer for Cyber-Security Applications

--In this paper we introduce Anomaly Contribution Explainer or ACE, a tool to explain security anomaly detection models in terms of the model features through a regression framework, and its variant, ACE-KL, which highlights the important anomaly contributors. ACE and ACE-KL provide insights in diagnosing which attributes significantly contribute to an anomaly by building a specialized linear model to locally approximate the anomaly score that a black-box model generates. We conducted experiments with these anomaly detection models to detect security anomalies on both synthetic data and real data. In particular, we evaluate performance on three public data sets: CERT insider threat, netflow logs, and Android malware. The experimental results are encouraging: our methods consistently identify the correct contributing feature in the synthetic data where ground truth is available; similarly, for real data sets, our methods point a security analyst in the direction of the underlying causes of an anomaly, including in one case leading to the discovery of previously overlooked network scanning activity. We have made our source code publicly available. Cyber-security is a key concern for both private and public organizations, given the high cost of security compromises and attacks; malicious cyber-activity cost the U.S. economy between $57 billion and$109 billion in 2016 [1]. As a result, spending on security research and development, and security products and services to detect and combat cyber-attacks has been increasing [2]. Organizations produce large amounts of network, host and application data that can be used to gain insights into cyber-security threats, misconfigurations, and network operations. While security domain experts can manually sift through some amount of data to spot attacks and understand them, it is virtually impossible to do so at scale, considering that even a medium sized enterprise can produce terabytes of data in a few hours.

Robust Variational Autoencoder for Tabular Data with Beta Divergence

We propose a robust variational autoencoder with $\beta$ divergence for tabular data (RTVAE) with mixed categorical and continuous features. Variational autoencoders (VAE) and their variations are popular frameworks for anomaly detection problems. The primary assumption is that we can learn representations for normal patterns via VAEs and any deviation from that can indicate anomalies. However, the training data itself can contain outliers. The source of outliers in training data include the data collection process itself (random noise) or a malicious attacker (data poisoning) who may target to degrade the performance of the machine learning model. In either case, these outliers can disproportionately affect the training process of VAEs and may lead to wrong conclusions about what the normal behavior is. In this work, we derive a novel form of a variational autoencoder for tabular data sets with categorical and continuous features that is robust to outliers in training data. Our results on the anomaly detection application for network traffic datasets demonstrate the effectiveness of our approach.

GAN-based method for cyber-intrusion detection

Ubiquitous cyber-intrusions endanger the security of our devices constantly. They may bring irreversible damages to the system and cause leakage of privacy. Thus, it is of vital importance to promptly detect these intrusions. Traditional methods such as Decision Trees and Support Vector Machine (SVM) are used to classify normal internet connections and cyber-intrusions. However, the intrusions are largely fewer than normal connections, which limits the capability of these methods. Anomaly detection methods such as Isolation Forest can handle the imbalanced data. Nevertheless, when the features of data increase, these methods lack enough ability to learn the distribution. Generative adversarial network (GAN) has been proposed to solve the above issues. With its strong generative ability, it only needs to learn the distribution of normal status, and identify the abnormal status when intrusion occurs. But existing models are not suitable to process discrete values, leading to immense degradation of detection performance. To cope with these challenges, in this paper, we propose a novel GAN-based model with specifically-designed loss function to detect cyber-intrusions. Experiment results show that our model outperforms state-of-the-art models and remarkably reduce the overhead.