RAD: On-line Anomaly Detection for Highly Unreliable Data

Zhao, Zilong, Birke, Robert, Han, Rui, Robu, Bogdan, Bouchenak, Sara, Mokhtar, Sonia Ben, Chen, Lydia Y.

arXiv.org Machine Learning 

--Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper, we present a two-layer online learning framework for robust anomaly detection (RAD) in the presence of unreliable anomaly labels, where the first layer is to filter out the suspicious data, and the second layer detects the anomaly patterns from the remaining data. T o adapt to the online nature of anomaly detection, we extend RAD with additional features of repetitively cleaning, conflicting opinions of classifiers, and oracle knowledge. We online learn from the incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. We specifically focus on three use cases, (i) detecting 10 classes of IoT attacks, (ii) predicting 4 classes of task failures of big data jobs, (iii) recognising 20 celebrities faces. Our evaluation results show that RAD can robustly improve the accuracy of anomaly detection, to reach up to 98% for IoT device attacks (i.e., 11%), up to 84% for cloud task failures (i.e., 20%) under 40% noise, and up to 74% for face recognition (i.e., 28%) under 30% noisy labels. The proposed RAD is general and can be applied to different anomaly detection algorithms. Anomaly detection is one of the core operations for enforcing dependability and performance in modern distributed systems [29], [44]. Anomalies can take various forms including erroneous data produced by a corrupted IoT device or the failure of a job executed in a datacenter [6], [7], [47]. Dealing with this issue has often been done in recent art by relying on machine learning-based classification algorithms over system logs [11], [13] or backend collected data [17], [46]. This work has been partly supported by the IRS (Initialtive de Recherche Strat egique) program DA TE. This work has been partly funded by the Swiss National Science Foundation NRP75 project 407540 167266 and TU Delft technology fellowship. As workloads at real systems are highly dynamic over time, it is even more challenging to predict anomalies that can not be easily distinguished from the system dynamics, compared to the systems with static workloads. In this context, a rising concern when applying classification algorithms is the accessibility to a reliable ground truth for anomalies [9].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found