An outlier detection method may be considered fair over specified sensitive attributes if the results of outlier detection are not skewed towards particular groups defined on such sensitive attributes. In this task, we consider, for the first time to our best knowledge, the task of fair outlier detection. In this work, we consider the task of fair outlier detection over multiple multi-valued sensitive attributes (e.g., gender, race, religion, nationality, marital status etc.). We propose a fair outlier detection method, FairLOF, that is inspired by the popular LOF formulation for neighborhood-based outlier detection. We outline ways in which unfairness could be induced within LOF and develop three heuristic principles to enhance fairness, which form the basis of the FairLOF method. Being a novel task, we develop an evaluation framework for fair outlier detection, and use that to benchmark FairLOF on quality and fairness of results. Through an extensive empirical evaluation over real-world datasets, we illustrate that FairLOF is able to achieve significant improvements in fairness at sometimes marginal degradations on result quality as measured against the fairness-agnostic LOF method.
Anomalies are occurrences in a dataset that are in some way unusual and do not fit the general patterns. The concept of the anomaly is generally ill-defined and perceived as vague and domain-dependent. Moreover, no comprehensive and concrete overviews of the different types of anomalies have hitherto been published. By means of an extensive literature review this study therefore offers the first theoretically principled and domain-independent typology of data anomalies, and presents a full overview of anomaly types and subtypes. To concretely define the concept of the anomaly and its different manifestations the typology employs four dimensions: data type, cardinality of relationship, data structure and data distribution. These fundamental and data-centric dimensions naturally yield 3 broad groups, 9 basic types and 61 subtypes of anomalies. The typology facilitates the evaluation of the functional capabilities of anomaly detection algorithms, contributes to explainable data science, and provides insights into relevant topics such as local versus global anomalies.
Learning classifiers for misuse and anomaly detection using a bag of system calls representation. Anomaly detection in health data based on deep learning. Abnormal human activity recognition using SVM based approach. Anomaly detection of gas turbines based on normal pattern extraction. Contextual anomaly detection for a critical industrial system based on logs and metrics.
Today during its annual IBM Think conference, IBM announced the launch of Watson AIOps, a service that taps AI to automate the real-time detection, diagnosing, and remediation of network anomalies. It also unveiled new offerings targeting the rollout of 5G technologies and the devices on those networks, as well as a coalition of telecommunications partners -- the IBM Telco Network Cloud Ecosystem -- that will work with IBM to deploy edge computing technologies. Watson AIOps marks IBM's foray into the mammoth AIOps market, which is expected to grow from $2.55 billion in 2018 to $11.02 billion by 2023, according to Markets and Markets. That might be a conservative projection in light of the pandemic, which is forcing IT teams to increasingly conduct their work remotely. In lieu of access to infrastructure, tools like Watson AIOps could help prevent major outages, the cost of which a study from Aberdeen pegged at $260,000 per hour.
Deep Learning (DL) is vulnerable to out-of-distribution and adversarial examples resulting in incorrect outputs. To make DL more robust, several posthoc anomaly detection techniques to detect (and discard) these anomalous samples have been proposed in the recent past. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection for DL based applications. We provide a taxonomy for existing techniques based on their underlying assumptions and adopted approaches. We discuss various techniques in each of the categories and provide the relative strengths and weaknesses of the approaches. Our goal in this survey is to provide an easier yet better understanding of the techniques belonging to different categories in which research has been done on this topic. Finally, we highlight the unsolved research challenges while applying anomaly detection techniques in DL systems and present some high-impact future research directions.
We propose a simple yet effective method for detecting anomalous instances on an attribute graph with label information of a small number of instances. Although with standard anomaly detection methods it is usually assumed that instances are independent and identically distributed, in many real-world applications, instances are often explicitly connected with each other, resulting in so-called attributed graphs. The proposed method embeds nodes (instances) on the attributed graph in the latent space by taking into account their attributes as well as the graph structure based on graph convolutional networks (GCNs). To learn node embeddings specialized for anomaly detection, in which there is a class imbalance due to the rarity of anomalies, the parameters of a GCN are trained to minimize the volume of a hypersphere that encloses the node embeddings of normal instances while embedding anomalous ones outside the hypersphere. This enables us to detect anomalies by simply calculating the distances between the node embeddings and hypersphere center. The proposed method can effectively propagate label information on a small amount of nodes to unlabeled ones by taking into account the node's attributes, graph structure, and class imbalance. In experiments with five real-world attributed graph datasets, we demonstrate that the proposed method achieves better performance than various existing anomaly detection methods.
This paper presents a novel approach for temporal modelling of long-term human activities based on wavelet transforms. The model is applied to binary smart-home sensors to forecast their signals, which are used then as temporal priors to infer anomalies in office and Active & Assisted Living (AAL) scenarios. Such inference is performed by a new extension of Hybrid Markov Logic Networks (HMLNs) that merges different anomaly indicators, including activity levels detected by sensors, expert rules and the new temporal models. The latter in particular allow the inference system to discover deviations from long-term activity patterns, which cannot by detected by simpler frequency-based models. Two new publicly available datasets were collected using several smart-sensors to evaluate the wavelet-based temporal models and their application to signal forecasting and anomaly detection. The experimental results show the effectiveness of the proposed techniques and their successful application to detect unexpected activities in office and AAL settings.
To overcome the energy and bandwidth limitations of traditional IoT systems, edge computing or information extraction at the sensor node has become popular. However, now it is important to create very low energy information extraction or pattern recognition systems. In this paper, we present an approximate computing method to reduce the computation energy of a specific type of IoT system used for anomaly detection (e.g. in predictive maintenance, epileptic seizure detection, etc). Termed as Anomaly Detection Based Power Savings (ADEPOS), our proposed method uses low precision computing and low complexity neural networks at the beginning when it is easy to distinguish healthy data. However, on the detection of anomalies, the complexity of the network and computing precision are adaptively increased for accurate predictions. We show that ensemble approaches are well suited for adaptively changing network size. To validate our proposed scheme, a chip has been fabricated in UMC65nm process that includes an MSP430 microprocessor along with an on-chip switching mode DC-DC converter for dynamic voltage and frequency scaling. Using NASA bearing dataset for machine health monitoring, we show that using ADEPOS we can achieve 8.95X saving of energy along the lifetime without losing any detection accuracy. The energy savings are obtained by reducing the execution time of the neural network on the microprocessor.
--In this paper we introduce Anomaly Contribution Explainer or ACE, a tool to explain security anomaly detection models in terms of the model features through a regression framework, and its variant, ACE-KL, which highlights the important anomaly contributors. ACE and ACE-KL provide insights in diagnosing which attributes significantly contribute to an anomaly by building a specialized linear model to locally approximate the anomaly score that a black-box model generates. We conducted experiments with these anomaly detection models to detect security anomalies on both synthetic data and real data. In particular, we evaluate performance on three public data sets: CERT insider threat, netflow logs, and Android malware. The experimental results are encouraging: our methods consistently identify the correct contributing feature in the synthetic data where ground truth is available; similarly, for real data sets, our methods point a security analyst in the direction of the underlying causes of an anomaly, including in one case leading to the discovery of previously overlooked network scanning activity. We have made our source code publicly available. Cyber-security is a key concern for both private and public organizations, given the high cost of security compromises and attacks; malicious cyber-activity cost the U.S. economy between $57 billion and $109 billion in 2016 . As a result, spending on security research and development, and security products and services to detect and combat cyber-attacks has been increasing . Organizations produce large amounts of network, host and application data that can be used to gain insights into cyber-security threats, misconfigurations, and network operations. While security domain experts can manually sift through some amount of data to spot attacks and understand them, it is virtually impossible to do so at scale, considering that even a medium sized enterprise can produce terabytes of data in a few hours.
Artificial intelligence applied to healthcare includes a collection of technologies that enable machines to sense, interpret, act and learn. AI implementations for digital health can be relatively simple when they are focused largely on personal patient engagement, or vastly complex when working with big data sets, highly specialized diagnostics, and the workflows of multiple highly complicated organizations. The addition of Internet of Things sensor data from connected health and related devices adds a new layer of critical, real-time contextual data. The uses cases for IoT sensor-informed healthcare applications can include home security and access control for vulnerable loved ones, remote patient monitoring of vitals, activity monitoring and anomaly detection, safety of the home and mobile environments, environmental monitoring for chronic conditions, and more. These use cases provide opportunities for the application of artificial intelligence and machine learning to transform healthcare functions into data-driven services that can improve outcomes and deliver healthcare more efficiently.