iforest
- South America > Paraguay > Asunción > Asunción (0.04)
- South America > Brazil (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (3 more...)
Isolation-based Spherical Ensemble Representations for Anomaly Detection
Cao, Yang, Yang, Sikun, Tian, Hao, He, Kai, Qi, Lianyong, Liu, Ming, Yang, Yujiu
Anomaly detection is a critical task in data mining and management with applications spanning fraud detection, network security, and log monitoring. Despite extensive research, existing unsupervised anomaly detection methods still face fundamental challenges including conflicting distributional assumptions, computational inefficiency, and difficulty handling different anomaly types. To address these problems, we propose ISER (Isolation-based Spherical Ensemble Representations) that extends existing isolation-based methods by using hypersphere radii as proxies for local density characteristics while maintaining linear time and constant space complexity. ISER constructs ensemble representations where hy-persphere radii encode density information: smaller radii indicate dense regions while larger radii correspond to sparse areas. We introduce a novel similarity-based scoring method that measures pattern consistency by comparing ensemble representations against a theoretical anomaly reference pattern. Additionally, we enhance the performance of Isolation Forest by using ISER and adapting the scoring function to address axis-parallel bias and local anomaly detection limitations. Comprehensive experiments on 22 real-world datasets demonstrate ISER's superior performance over 11 baseline methods. Anomaly detection is the task of identifying data points that deviate significantly from the majority of observations, with applications in fraud detection, network security, and quality control (Chandola et al., 2009; Liu et al., 2024; Tang et al., 2024; Song et al., 2023). Despite extensive research, developing effective unsupervised anomaly detection methods remains challenging due to several fundamental limitations. Existing methods face a critical trade-off between computational efficiency and handling varying local densities. Density-based methods like Local Outlier Factor (Breunig et al., 2000) address this but require quadratic time complexity, limiting scalability.
- Oceania > Australia (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis
Ye, Hangting, Li, Jinmeng, Zhao, He, Zhuge, Mingchen, Guo, Dandan, Chang, Yi, Zha, Hongyuan
Existing anomaly detection (AD) methods for tabular data usually rely on some assumptions about anomaly patterns, leading to inconsistent performance in real-world scenarios. While Large Language Models (LLMs) show remarkable reasoning capabilities, their direct application to tabular AD is impeded by fundamental challenges, including difficulties in processing heterogeneous data and significant privacy risks. To address these limitations, we propose LLM-DAS, a novel framework that repositions the LLM from a ``data processor'' to an ``algorithmist''. Instead of being exposed to raw data, our framework leverages the LLM's ability to reason about algorithms. It analyzes a high-level description of a given detector to understand its intrinsic weaknesses and then generates detector-specific, data-agnostic Python code to synthesize ``hard-to-detect'' anomalies that exploit these vulnerabilities. This generated synthesis program, which is reusable across diverse datasets, is then instantiated to augment training data, systematically enhancing the detector's robustness by transforming the problem into a more discriminative two-class classification task. Extensive experiments on 36 TAD benchmarks show that LLM-DAS consistently boosts the performance of mainstream detectors. By bridging LLM reasoning with classic AD algorithms via programmatic synthesis, LLM-DAS offers a scalable, effective, and privacy-preserving approach to patching the logical blind spots of existing detectors.
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (3 more...)
Randomized PCA Forest for Outlier Detection
Rajabinasab, Muhammad, Pakdaman, Farhad, Gabbouj, Moncef, Schneider-Kamp, Peter, Zimek, Arthur
--We propose a novel unsupervised outlier detection method based on Randomized Principal Component Analysis (PCA). Inspired by the performance of Randomized PCA (RPCA) Forest in approximate K-Nearest Neighbor (KNN) search, we develop a novel unsupervised outlier detection method that utilizes RPCA Forest for outlier detection. Experimental results showcase the superiority of the proposed approach compared to the classical and state-of-the-art methods in performing the outlier detection task on several datasets while performing competitively on the rest. The extensive analysis of the proposed method reflects it high generalization power and its computational efficiency, highlighting it as a good choice for unsupervised outlier detection. An outlier, as defined by Hawkins [18], is "an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism." Similarly, Barnett and Lewis [3] describe it as "an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data." Outlier detection is the process of identifying such outliers, i.e., the data points which differ from the rest of the data. It is one of the most important and fundamental tasks in data mining and machine learning with applications in intrusion detection [20], fault detection [37], fraud detection [7] and others [11], [13], [27]. In recent years, many methods have been proposed to carry out the outlier detection task [1], [9], [10], [23], [42]. Despite the demonstration of promising results, further studies show that these results might be limited only to specific instances of the problem (e.g., a limited selection of datasets, a specific kind of outliers, etc.) [6].
- North America > United States > Wisconsin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Finland (0.04)
- Europe > Denmark > Southern Denmark (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.67)
- Health & Medicine > Therapeutic Area (0.94)
- Law Enforcement & Public Safety (0.68)
- Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Data-Driven Heat Pump Management: Combining Machine Learning with Anomaly Detection for Residential Hot Water Systems
Rahal, Manal, Ahmed, Bestoun S., Renstrom, Roger, Stener, Robert, Wurtz, Albrecht
Heat pumps (HPs) have emerged as a cost-effective and clean technology for sustainable energy systems, but their efficiency in producing hot water remains restricted by conventional threshold-based control methods. Although machine learning (ML) has been successfully implemented for various HP applications, optimization of household hot water demand forecasting remains understudied. This paper addresses this problem by introducing a novel approach that combines predictive ML with anomaly detection to create adaptive hot water production strategies based on household-specific consumption patterns. Our key contributions include: (1) a composite approach combining ML and isolation forest (iForest) to forecast household demand for hot water and steer responsive HP operations; (2) multi-step feature selection with advanced time-series analysis to capture complex usage patterns; (3) application and tuning of three ML models: Light Gradient Boosting Machine (LightGBM), Long Short-Term Memory (LSTM), and Bi-directional LSTM with the self-attention mechanism on data from different types of real HP installations; and (4) experimental validation on six real household installations. Our experiments show that the best-performing model LightGBM achieves superior performance, with RMSE improvements of up to 9.37\% compared to LSTM variants with $R^2$ values between 0.748-0.983. For anomaly detection, our iForest implementation achieved an F1-score of 0.87 with a false alarm rate of only 5.2\%, demonstrating strong generalization capabilities across different household types and consumption patterns, making it suitable for real-world HP deployments.
- North America > United States (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Europe > Sweden > Värmland County > Karlstad (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Research Report > Experimental Study (0.93)
Theoretical Investigation on Inductive Bias of Isolation Forest
Zheng, Qin-Cheng, Zhang, Shao-Qun, Lyu, Shen-Huan, Jiang, Yuan, Zhou, Zhi-Hua
Isolation Forest (iForest) stands out as a widely-used unsupervised anomaly detector valued for its exceptional runtime efficiency and performance on large-scale tasks. Despite its widespread adoption, a theoretical foundation explaining iForest's success remains unclear. This paper theoretically investigates the conditions and extent of iForest's effectiveness by analyzing its inductive bias through the formulation of depth functions and growth processes. Since directly analyzing the depth function proves intractable due to iForest's random splitting mechanism, we model the growth process of iForest as a random walk, enabling us to derive the expected depth function using transition probabilities. Our case studies reveal key inductive biases: iForest exhibits lower sensitivity to central anomalies while demonstrating greater parameter adaptability compared to $k$-Nearest Neighbor anomaly detectors. Our study provides theoretical understanding of the effectiveness of iForest and establishes a foundation for further theoretical exploration.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > Middle East > Yemen > Amran Governorate > Amran (0.04)
Detecting Anomalies Using Rotated Isolation Forest
Monemizadeh, Vahideh, Kiani, Kourosh
The Isolation Forest (iForest), proposed by Liu, Ting, and Zhou at TKDE 2012, has become a prominent tool for unsupervised anomaly detection. However, recent research by Hariri, Kind, and Brunner, published in TKDE 2021, has revealed issues with iForest. They identified the presence of axis-aligned ghost clusters that can be misidentified as normal clusters, leading to biased anomaly scores and inaccurate predictions. In response, they developed the Extended Isolation Forest (EIF), which effectively solves these issues by eliminating the ghost clusters introduced by iForest. This enhancement results in improved consistency of anomaly scores and superior performance. We reveal a previously overlooked problem in the Extended Isolation Forest (EIF), showing that it is vulnerable to ghost inter-clusters between normal clusters of data points. In this paper, we introduce the Rotated Isolation Forest (RIF) algorithm which effectively addresses both the axis-aligned ghost clusters observed in iForest and the ghost inter-clusters seen in EIF. RIF accomplishes this by randomly rotating the dataset (using random rotation matrices and QR decomposition) before feeding it into the iForest construction, thereby increasing dataset variation and eliminating ghost clusters. Our experiments conclusively demonstrate that the RIF algorithm outperforms iForest and EIF, as evidenced by the results obtained from both synthetic datasets and real-world datasets.
- Europe > Switzerland (0.04)
- Asia > Singapore (0.04)
- North America > United States > New Jersey (0.04)
- (2 more...)