anomalous node
OCR-APT: Reconstructing APT Stories from Audit Logs using Subgraph Anomaly Detection and LLMs
Aly, Ahmed, Mansour, Essam, Youssef, Amr
Advanced Persistent Threats (APTs) are stealthy cyberattacks that often evade detection in system-level audit logs. Provenance graphs model these logs as connected entities and events, revealing relationships that are missed by linear log representations. Existing systems apply anomaly detection to these graphs but often suffer from high false positive rates and coarse-grained alerts. Their reliance on node attributes like file paths or IPs leads to spurious correlations, reducing detection robustness and reliability. To fully understand an attack's progression and impact, security analysts need systems that can generate accurate, human-like narratives of the entire attack. To address these challenges, we introduce OCR-APT, a system for APT detection and reconstruction of human-like attack stories. OCR-APT uses Graph Neural Networks (GNNs) for subgraph anomaly detection, learning behavior patterns around nodes rather than fragile attributes such as file paths or IPs. This approach leads to a more robust anomaly detection. It then iterates over detected subgraphs using Large Language Models (LLMs) to reconstruct multi-stage attack stories. Each stage is validated before proceeding, reducing hallucinations and ensuring an interpretable final report. Our evaluations on the DARPA TC3, OpTC, and NODLINK datasets show that OCR-APT outperforms state-of-the-art systems in both detection accuracy and alert interpretability. Moreover, OCR-APT reconstructs human-like reports that comprehensively capture the attack story.
GTHNA: Local-global Graph Transformer with Memory Reconstruction for Holistic Node Anomaly Evaluation
Li, Mingkang, Luo, Xuexiong, Zhang, Yue, Li, Yaoyang, Lin, Fu
Anomaly detection in graph-structured data is an inherently challenging problem, as it requires the identification of rare nodes that deviate from the majority in both their structural and behavioral characteristics. Existing methods, such as those based on graph convolutional networks (GCNs), often suffer from over-smoothing, which causes the learned node representations to become indistinguishable. Furthermore, graph reconstruction-based approaches are vulnerable to anomalous node interference during the reconstruction process, leading to inaccurate anomaly detection. In this work, we propose a novel and holistic anomaly evaluation framework that integrates three key components: a local-global Transformer encoder, a memory-guided reconstruction mechanism, and a multi-scale representation matching strategy. These components work synergistically to enhance the model's ability to capture both local and global structural dependencies, suppress the influence of anomalous nodes, and assess anomalies from multiple levels of granularity. Anomaly scores are computed by combining reconstruction errors and memory matching signals, resulting in a more robust evaluation. Extensive experiments on seven benchmark datasets demonstrate that our method outperforms existing state-of-the-art approaches, offering a comprehensive and generalizable solution for anomaly detection across various graph domains.
Unsupervised Graph Anomaly Detection via Multi-Hypersphere Heterophilic Graph Learning
Ni, Hang, Han, Jindong, Zhu, Nengjun, Liu, Hao
Graph Anomaly Detection (GAD) plays a vital role in various data mining applications such as e-commerce fraud prevention and malicious user detection. Recently, Graph Neural Network (GNN) based approach has demonstrated great effectiveness in GAD by first encoding graph data into low-dimensional representations and then identifying anomalies under the guidance of supervised or unsupervised signals. However, existing GNN-based approaches implicitly follow the homophily principle (i.e., the "like attracts like" phenomenon) and fail to learn discriminative embedding for anomalies that connect vast normal nodes. Moreover, such approaches identify anomalies in a unified global perspective but overlook diversified abnormal patterns conditioned on local graph context, leading to suboptimal performance. To overcome the aforementioned limitations, in this paper, we propose a Multi-hypersphere Heterophilic Graph Learning (MHetGL) framework for unsupervised GAD. Specifically, we first devise a Heterophilic Graph Encoding (HGE) module to learn distinguishable representations for potential anomalies by purifying and augmenting their neighborhood in a fully unsupervised manner. Then, we propose a Multi-Hypersphere Learning (MHL) module to enhance the detection capability for context-dependent anomalies by jointly incorporating critical patterns from both global and local perspectives. Extensive experiments on ten real-world datasets show that MHetGL outperforms 14 baselines. Our code is publicly available at https://github.com/KennyNH/MHetGL.
Graph Anomaly Detection via Adaptive Test-time Representation Learning across Out-of-Distribution Domains
Pirhayati, Delaram, Silva, Arlei
Graph Anomaly Detection (GAD) has demonstrated great effectiveness in identifying unusual patterns within graph-structured data. However, while labeled anomalies are often scarce in emerging applications, existing supervised GAD approaches are either ineffective or not applicable when moved across graph domains due to distribution shifts and heterogeneous feature spaces. To address these challenges, we present AdaGraph-T3, a novel test-time training framework for cross-domain GAD. AdaGraph-T3 combines supervised and self-supervised learning during training while adapting to a new domain during test time using only self-supervised learning by leveraging a homophily-based affinity score that captures domain-invariant properties of anomalies. Our framework introduces four key innovations to cross-domain GAD: an effective self-supervision scheme, an attention-based mechanism that dynamically learns edge importance weights during message passing, domain-specific encoders for handling heterogeneous features, and class-aware regularization to address imbalance. Experiments across multiple cross-domain settings demonstrate that AdaGraph-T3 significantly outperforms existing approaches, achieving average improvements of over 6.6% in AUROC and 7.9% in AUPRC compared to the best competing model.
High-Pass Graph Convolutional Network for Enhanced Anomaly Detection: A Novel Approach
Li, Shelei, Tan, Yong Chai, Vincent, Tai
Graph Convolutional Network (GCN) are widely used in Graph Anomaly Detection (GAD) due to their natural compatibility with graph structures, resulting in significant performance improvements. However, most researchers approach GAD as a graph node classification task and often rely on low-pass filters or feature aggregation from neighboring nodes. This paper proposes a novel approach by introducing a High-Pass Graph Convolution Network (HP-GCN) for GAD. The proposed HP-GCN leverages high-frequency components to detect anomalies, as anomalies tend to increase high-frequency signals within the network of normal nodes. Additionally, isolated nodes, which lack interactions with other nodes, present a challenge for Graph Neural Network (GNN). To address this, the model segments the graph into isolated nodes and nodes within connected subgraphs. Isolated nodes learn their features through Multi-Layer Perceptron (MLP), enhancing detection accuracy. The model is evaluated and validated on YelpChi, Amazon, T-Finance, and T-Social datasets. The results showed that the proposed HP-GCN can achieve anomaly detection accuracy of 96.10%, 98.16%, 96.46%, and 98.94%, respectively. The findings demonstrate that the HP-GCN outperforms existing GAD methods based on spatial domain GNN as well as those using low-pass and band-pass filters in spectral domain GCN. The findings underscore the effectiveness of this method in improving anomaly detection performance. Source code can be found at: https://github.com/meteor0033/High-pass_GAD.git.
ADA-GAD: Anomaly-Denoised Autoencoders for Graph Anomaly Detection
He, Junwei, Xu, Qianqian, Jiang, Yangbangyan, Wang, Zitai, Huang, Qingming
Graph anomaly detection is crucial for identifying nodes that deviate from regular behavior within graphs, benefiting various domains such as fraud detection and social network. Although existing reconstruction-based methods have achieved considerable success, they may face the \textit{Anomaly Overfitting} and \textit{Homophily Trap} problems caused by the abnormal patterns in the graph, breaking the assumption that normal nodes are often better reconstructed than abnormal ones. Our observations indicate that models trained on graphs with fewer anomalies exhibit higher detection performance. Based on this insight, we introduce a novel two-stage framework called Anomaly-Denoised Autoencoders for Graph Anomaly Detection (ADA-GAD). In the first stage, we design a learning-free anomaly-denoised augmentation method to generate graphs with reduced anomaly levels. We pretrain graph autoencoders on these augmented graphs at multiple levels, which enables the graph autoencoders to capture normal patterns. In the next stage, the decoders are retrained for detection on the original graph, benefiting from the multi-level representations learned in the previous stage. Meanwhile, we propose the node anomaly distribution regularization to further alleviate \textit{Anomaly Overfitting}. We validate the effectiveness of our approach through extensive experiments on both synthetic and real-world datasets.
DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection
Huang, Xuanwen, Yang, Yang, Wang, Yang, Wang, Chunping, Zhang, Zhisheng, Xu, Jiarong, Chen, Lei, Vazirgiannis, Michalis
Graph Anomaly Detection (GAD) has recently become a hot research spot due to its practicability and theoretical value. Since GAD emphasizes the application and the rarity of anomalous samples, enriching the varieties of its datasets is fundamental. Thus, this paper presents DGraph, a real-world dynamic graph in the finance domain. DGraph overcomes many limitations of current GAD datasets. It contains about 3M nodes, 4M dynamic edges, and 1M ground-truth nodes. We provide a comprehensive observation of DGraph, revealing that anomalous nodes and normal nodes generally have different structures, neighbor distribution, and temporal dynamics. Moreover, it suggests that 2M background nodes are also essential for detecting fraudsters. Furthermore, we conduct extensive experiments on DGraph. Observation and experiments demonstrate that DGraph is propulsive to advance GAD research and enable in-depth exploration of anomalous nodes.
threaTrace: Detecting and Tracing Host-based Threats in Node Level Through Provenance Graph Learning
Wang, Su, Wang, Zhiliang, Zhou, Tao, Yin, Xia, Han, Dongqi, Zhang, Han, Sun, Hongbin, Shi, Xingang, Yang, Jiahai
--Host-based threats such as Program Attack, Malware Implantation, and Advanced Persistent Threats (APT), are commonly adopted by modern attackers. Recent studies propose leveraging the rich contextual information in data provenance to detect threats in a host. Data provenance is a directed acyclic graph constructed from system audit data. Nodes in a provenance graph represent system entities (e.g., processes and files) and edges represent system calls in the direction of information flow. However, previous studies, which extract features of the whole provenance graph, are not sensitive to the small number of threat-related entities and thus result in low performance when hunting stealthy threats. We tailor GraphSAGE, an inductive graph neural network, to learn every benign entity's role in a provenance graph. OW ADA YS, attackers tend to perform intrusion activities in important hosts of those big enterprises and governments [1]. They usually exploit zero-day ...
ResGCN: Attention-based Deep Residual Modeling for Anomaly Detection on Attributed Networks
Pei, Yulong, Huang, Tianjin, van Ipenburg, Werner, Pechenizkiy, Mykola
Effectively detecting anomalous nodes in attributed networks is crucial for the success of many real-world applications such as fraud and intrusion detection. Existing approaches have difficulties with three major issues: sparsity and nonlinearity capturing, residual modeling, and network smoothing. We propose Residual Graph Convolutional Network (ResGCN), an attention-based deep residual modeling approach that can tackle these issues: modeling the attributed networks with GCN allows to capture the sparsity and nonlinearity; utilizing a deep neural network allows to directly learn residual from the input, and a residual-based attention mechanism reduces the adverse effect from anomalous nodes and prevents over-smoothing. Extensive experiments on several real-world attributed networks demonstrate the effectiveness of ResGCN in detecting anomalies.