Goto

Collaborating Authors

 Anomaly Detection


AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection

Neural Information Processing Systems

Analyzing the distribution shift of data is a growing research direction in nowadays Machine Learning (ML), leading to emerging new benchmarks that focus on providing a suitable scenario for studying the generalization properties of ML models. The existing benchmarks are focused on supervised learning, and to the best of our knowledge, there is none for unsupervised learning. Therefore, we introduce an unsupervised anomaly detection benchmark with data that shifts over time, built over Kyoto-2006+, a traffic dataset for network intrusion detection. This type of data meets the premise of shifting the input distribution: it covers a large time span (10 years), with naturally occurring changes over time (e.g.


OpenOOD: Benchmarking Generalized Out-of-Distribution Detection

Neural Information Processing Systems

Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications and has thus been extensively studied, with a plethora of methods developed in the literature. However, the field currently lacks a unified, strictly formulated, and comprehensive benchmark, which often results in unfair comparisons and inconclusive results. From the problem setting perspective, OOD detection is closely related to neighboring fields including anomaly detection (AD), open set recognition (OSR), and model uncertainty, since methods developed for one domain are often applicable to each other. To help the community to improve the evaluation and advance, we build a unified, well-structured codebase called OpenOOD, which implements over 30 methods developed in relevant fields and provides a comprehensive benchmark under the recently proposed generalized OOD detection framework. With a comprehensive comparison of these methods, we are gratified that the field has progressed significantly over the past few years, where both preprocessing methods and the orthogonal post-hoc methods show strong potential. We invite readers to use our OpenOOD codebase to develop and contribute. The full experimental results are available in this table.


ADBench: Anomaly Detection Benchmark

Neural Information Processing Systems

Given a long list of anomaly detection algorithms developed in the last few decades, how do they perform with regard to (i) varying levels of supervision, (ii) different types of anomalies, and (iii) noisy and corrupted data? In this work, we answer these key questions by conducting (to our best knowledge) the most comprehensive anomaly detection benchmark with 30 algorithms on 57 benchmark datasets, named ADBench. Our extensive experiments (98,436 in total) identify meaningful insights into the role of supervision and anomaly types, and unlock future directions for researchers in algorithm selection and design. With ADBench, researchers can efficiently conduct comprehensive and fair evaluations for newly proposed methods on the datasets (including our contributed ones from natural language and computer vision domains) against the existing baselines. To foster accessibility and reproducibility, we fully open-source ADBench and the corresponding results.


GADBench: Revisiting and Benchmarking Supervised Graph Anomaly Detection

Neural Information Processing Systems

With a long history of traditional Graph Anomaly Detection (GAD) algorithms and recently popular Graph Neural Networks (GNNs), it is still not clear (1) how they perform under a standard comprehensive setting, (2) whether GNNs can outperform traditional algorithms such as tree ensembles, and (3) how about their efficiency on large-scale graphs. In response, we introduce GADBench--a benchmark tool dedicated to supervised anomalous node detection in static graphs. GADBench facilitates a detailed comparison across 29 distinct models on ten real-world GAD datasets, encompassing thousands to millions ( 6M) nodes. Our main finding is that tree ensembles with simple neighborhood aggregation can outperform the latest GNNs tailored for the GAD task. We shed light on the current progress of GAD, setting a robust groundwork for subsequent investigations in this domain.


Appendix A Related Work of AUC Optimization

Neural Information Processing Systems

OpenAUC is naturally related to AUC [23] due to the pairwise formulation Eq.(12) and the surrogate loss used in Eq.(15). Specifically, for a binary classification problem, AUC, the Area Under the ROC Curve, measures the probability that the positive instances are ranked higher than the negative ones. Benefiting from this property, AUC is essentially insensitive to label distribution and thus has become a popular metric for imbalanced scenarios such as disease prediction [37] and novelty detection [18]. As pointed out in [23], optimizing the AUC performance cannot be realized by the traditional learning paradigm that minimizes the error rate. To this end, how to optimize the AUC performance has raised wide attention.


Extreme bandits

Neural Information Processing Systems

In many areas of medicine, security, and life sciences, we want to allocate limited resources to different sources in order to detect extreme values. In this paper, we study an efficient way to allocate these resources sequentially under limited feedback. While sequential design of experiments is well studied in bandit theory, the most commonly optimized property is the regret with respect to the maximum mean reward. However, in other problems such as network intrusion detection, we are interested in detecting the most extreme value output by the sources. Therefore, in our work we study extreme regret which measures the efficiency of an algorithm compared to the oracle policy selecting the source with the heaviest tail.


Appendix

Neural Information Processing Systems

This document is the appendix of paper "Dual-discriminative Graph Neural Network for Imbalanced Graph-level Anomaly Detection". This section introduces the deduction about Eq. (7) (see A.1) and the explanation about Eq. (14) (see A.2). In addition, the time complexity analysis of iGAD is shown in A.3. Proposition 2. Let the matrix A R Proposition 3. Let the matrix A R Proof of Proposition 3. Based on Proposition 2, we can get (A B)(A B) = A (l 1) (l 1) In summary, the final computational complexity of iGAD is O(n + |E|). This section introduces the details of parameter setting (see B.1) and additional parameter analysis experiments (see B.2). Figure 4: The patterns from left to right are 3-star, triangle, tailed triangle, and 4-cycle graphlets.


Dual-discriminative Graph Neural Network for Imbalanced Graph-level Anomaly Detection Ge Zhang 1 1

Neural Information Processing Systems

Graph-level anomaly detection aims to distinguish anomalous graphs in a graph dataset from normal graphs. Anomalous graphs represent a very few but essential patterns in the real world. The anomalous property of a graph may be referable to its anomalous attributes of particular nodes and anomalous substructures that refer to a subset of nodes and edges in the graph. In addition, due to the imbalance nature of anomaly problem, anomalous information will be diluted by normal graphs with overwhelming quantities. Various anomaly notions in the attributes and/or substructures and the imbalance nature together make detecting anomalous graphs a non-trivial task.


On Integrated Clustering and Outlier Detection

Neural Information Processing Systems

The advantages of combining clustering and outlier selection include: (i) the resulting clusters tend to be compact and semantically coherent (ii) the clusters are more robust against data perturbations and (iii) the outliers are contextualised by the clusters and more interpretable. We provide a practical subgradient-based algorithm for the problem and also study the theoretical properties of algorithm in terms of approximation and convergence. Extensive evaluation on synthetic and real data sets attest to both the quality and scalability of our proposed method.


DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection Xuanwen Huang, Yang Yang, Yang Wang

Neural Information Processing Systems

Graph Anomaly Detection (GAD) has recently become a hot research spot due to its practicability and theoretical value. Since GAD emphasizes the application and the rarity of anomalous samples, enriching the varieties of its datasets is fundamental.