Content delivery networks (CDNs) provide efficient content distribution over the Internet. CDNs improve the connectivity and efficiency of global communications, but their caching mechanisms may be breached by cyber-attackers. Among the security mechanisms, effective anomaly detection forms an important part of CDN security enhancement. In this work, we propose a multi-perspective unsupervised learning framework for anomaly detection in CDNs. In the proposed framework, a multi-perspective feature engineering approach, an optimized unsupervised anomaly detection model that utilizes an isolation forest and a Gaussian mixture model, and a multi-perspective validation method, are developed to detect abnormal behaviors in CDNs mainly from the client Internet Protocol (IP) and node perspectives, therefore to identify the denial of service (DoS) and cache pollution attack (CPA) patterns. Experimental results are presented based on the analytics of eight days of real-world CDN log data provided by a major CDN operator. Through experiments, the abnormal contents, compromised nodes, malicious IPs, as well as their corresponding attack types, are identified effectively by the proposed framework and validated by multiple cybersecurity experts. This shows the effectiveness of the proposed method when applied to real-world CDN data.
Industrial Information Technology (IT) infrastructures are often vulnerable to cyberattacks. To ensure security to the computer systems in an industrial environment, it is required to build effective intrusion detection systems to monitor the cyber-physical systems (e.g., computer networks) in the industry for malicious activities. This paper aims to build such intrusion detection systems to protect the computer networks from cyberattacks. More specifically, we propose a novel unsupervised machine learning approach that combines the K-Means algorithm with the Isolation Forest for anomaly detection in industrial big data scenarios. Since our objective is to build the intrusion detection system for the big data scenario in the industrial domain, we utilize the Apache Spark framework to implement our proposed model which was trained in large network traffic data (about 123 million instances of network traffic) stored in Elasticsearch. Moreover, we evaluate our proposed model on the live streaming data and find that our proposed system can be used for real-time anomaly detection in the industrial setup. In addition, we address different challenges that we face while training our model on large datasets and explicitly describe how these issues were resolved. Based on our empirical evaluation in different use-cases for anomaly detection in real-world network traffic data, we observe that our proposed system is effective to detect anomalies in big data scenarios. Finally, we evaluate our proposed model on several academic datasets to compare with other models and find that it provides comparable performance with other state-of-the-art approaches.
Abstract--Online detection of anomalies in time series is a key technique in various event-sensitive scenarios such a s robotic system monitoring, smart sensor networks and data center security. However, the increasing diversity of data sources and demands are making this task more challenging than ever . First, the rapid increase of unlabeled data makes supervise d learning no longer suitable in many cases. Second, a great po rtion of time series have complex seasonality features. Third, on -line anomaly detection needs to be fast and reliable. In view of this, we in this paper adopt an unsupervised prediction-dri ven approach on the basis of a backbone model combining a series decomposition part and an inference part. We then propose a novel metric, Local Trend Inconsistency (L TI), along with a detection algorithm that efficiently computes L TI chronolo gically along the series and marks each data point with a score indica ting its probability of being anomalous. The result shows that our scheme outperforms several representative anomaly detection alg orithms in Area Under Curve (AUC) metric with decent time efficiency. While time series data has been ubiquitous before the coming of big data era, a large number of recently emerging technical scenarios like autonomous driving, edge computi ng and Internet of Things (IoT) pose new challenges to the detection of anomalies in this type of data. In the meantime, detection techniques that can provide early, reliable repo rts of anomaly has become crucial for a wide range of systems requiring 24/7 monitoring services. In cloud data centers, for example, a distributed monitoring system usually collects a variety of log data from virtual machine level to cluster lev el on a regular basis and sends them to a central detection module, which needs to analyze the aggregated time series to detect any anomalous events including hardware breakdown, unavailable services and cyber attacks. This requires an on - line detector capable of making reliable detections (i.e., with strong sensitivity and specificity), otherwise it could bri ng about unnecessary cost of maintenance.
Anomaly detection is an important problem that has been well-studied within diverse research areas and application domains. The aim of this survey is twofold, firstly we present a structured and comprehensive overviewof research methods in deep learning-based anomaly detection. Furthermore, we review the adoption of these methods for anomaly across various application domains and assess their effectiveness. We have grouped state-of-the-art deep anomaly detection research techniques into different categories based on the underlying assumptions and approach adopted. Within each category, we outline the basic anomaly detection technique, along with its variants and present key assumptions, to differentiate between normal and anomalous behavior. Besides, for each category, we also present the advantages and limitations and discuss the computational complexity of the techniques inreal application domains. Finally, we outline open issues in research and challenges faced while adopting deep anomaly detection techniques for real-world problems.
Today's Cyber-Physical Systems (CPSs) are large, complex, and affixed with networked sensors and actuators that are targets for cyber-attacks. Conventional detection techniques are unable to deal with the increasingly dynamic and complex nature of the CPSs. On the other hand, the networked sensors and actuators generate large amounts of data streams that can be continuously monitored for intrusion events. Unsupervised machine learning techniques can be used to model the system behaviour and classify deviant behaviours as possible attacks. In this work, we proposed a novel Generative Adversarial Networks-based Anomaly Detection (GAN-AD) method for such complex networked CPSs. We used LSTM-RNN in our GAN to capture the distribution of the multivariate time series of the sensors and actuators under normal working conditions of a CPS. Instead of treating each sensor's and actuator's time series independently, we model the time series of multiple sensors and actuators in the CPS concurrently to take into account of potential latent interactions between them. To exploit both the generator and the discriminator of our GAN, we deployed the GAN-trained discriminator together with the residuals between generator-reconstructed data and the actual samples to detect possible anomalies in the complex CPS. We used our GAN-AD to distinguish abnormal attacked situations from normal working conditions for a complex six-stage Secure Water Treatment (SWaT) system. Experimental results showed that the proposed strategy is effective in identifying anomalies caused by various attacks with high detection rate and low false positive rate as compared to existing methods.