AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.88)

Parikshit Gopalan, Vatsal Sharan, Udi Wieder

PIDForest: Anomaly Detection via Partial Identification

Neural Information Processing SystemsFeb-14-2026, 22:51:38 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, dataset, pidforest, (17 more...)

Country:

South America > Paraguay > Asunción > Asunción (0.04)
South America > Brazil (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-7-2026, 21:15:08 GMT

SupplementaryMaterial: AutomaticUnsupervisedOutlierModelSelection Details on Models, Meta-features, Datasets/Testbeds, Optimization, pseudo code,and Detailed ExperimentResult AMETAODModelSet

In classical matrix factorization setting, some entries of the performance matrixP is missing.

artificial intelligence, iforest, machine learning, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceOct-16-2025

Isolation-based Spherical Ensemble Representations for Anomaly Detection

Cao, Yang, Yang, Sikun, Tian, Hao, He, Kai, Qi, Lianyong, Liu, Ming, Yang, Yujiu

Anomaly detection is a critical task in data mining and management with applications spanning fraud detection, network security, and log monitoring. Despite extensive research, existing unsupervised anomaly detection methods still face fundamental challenges including conflicting distributional assumptions, computational inefficiency, and difficulty handling different anomaly types. To address these problems, we propose ISER (Isolation-based Spherical Ensemble Representations) that extends existing isolation-based methods by using hypersphere radii as proxies for local density characteristics while maintaining linear time and constant space complexity. ISER constructs ensemble representations where hy-persphere radii encode density information: smaller radii indicate dense regions while larger radii correspond to sparse areas. We introduce a novel similarity-based scoring method that measures pattern consistency by comparing ensemble representations against a theoretical anomaly reference pattern. Additionally, we enhance the performance of Isolation Forest by using ISER and adapting the scoring function to address axis-parallel bias and local anomaly detection limitations. Comprehensive experiments on 22 real-world datasets demonstrate ISER's superior performance over 11 baseline methods. Anomaly detection is the task of identifying data points that deviate significantly from the majority of observations, with applications in fraud detection, network security, and quality control (Chandola et al., 2009; Liu et al., 2024; Tang et al., 2024; Song et al., 2023). Despite extensive research, developing effective unsupervised anomaly detection methods remains challenging due to several fundamental limitations. Existing methods face a critical trade-off between computational efficiency and handling varying local densities. Density-based methods like Local Outlier Factor (Breunig et al., 2000) address this but require quadratic time complexity, limiting scalability.

data mining, hypersphere, machine learning, (21 more...)

2510.13311

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.54)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceOct-7-2025

LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis

Ye, Hangting, Li, Jinmeng, Zhao, He, Zhuge, Mingchen, Guo, Dandan, Chang, Yi, Zha, Hongyuan

Existing anomaly detection (AD) methods for tabular data usually rely on some assumptions about anomaly patterns, leading to inconsistent performance in real-world scenarios. While Large Language Models (LLMs) show remarkable reasoning capabilities, their direct application to tabular AD is impeded by fundamental challenges, including difficulties in processing heterogeneous data and significant privacy risks. To address these limitations, we propose LLM-DAS, a novel framework that repositions the LLM from a ``data processor'' to an ``algorithmist''. Instead of being exposed to raw data, our framework leverages the LLM's ability to reason about algorithms. It analyzes a high-level description of a given detector to understand its intrinsic weaknesses and then generates detector-specific, data-agnostic Python code to synthesize ``hard-to-detect'' anomalies that exploit these vulnerabilities. This generated synthesis program, which is reusable across diverse datasets, is then instantiated to augment training data, systematically enhancing the detector's robustness by transforming the problem into a more discriminative two-class classification task. Extensive experiments on 36 TAD benchmarks show that LLM-DAS consistently boosts the performance of mainstream detectors. By bridging LLM reasoning with classic AD algorithms via programmatic synthesis, LLM-DAS offers a scalable, effective, and privacy-preserving approach to patching the logical blind spots of existing detectors.

anomaly, large language model, machine learning, (21 more...)

2510.03904

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Xiaoyi Gu, Leman Akoglu, Alessandro Rinaldo

Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection

Neural Information Processing SystemsOct-3-2025, 02:27:45 GMT

Neural Information Processing Systems http://nips.cc/

data mining, detection, machine learning, (18 more...)

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Rajabinasab, Muhammad, Pakdaman, Farhad, Gabbouj, Moncef, Schneider-Kamp, Peter, Zimek, Arthur

Randomized PCA Forest for Outlier Detection

arXiv.org Machine LearningAug-25-2025

--We propose a novel unsupervised outlier detection method based on Randomized Principal Component Analysis (PCA). Inspired by the performance of Randomized PCA (RPCA) Forest in approximate K-Nearest Neighbor (KNN) search, we develop a novel unsupervised outlier detection method that utilizes RPCA Forest for outlier detection. Experimental results showcase the superiority of the proposed approach compared to the classical and state-of-the-art methods in performing the outlier detection task on several datasets while performing competitively on the rest. The extensive analysis of the proposed method reflects it high generalization power and its computational efficiency, highlighting it as a good choice for unsupervised outlier detection. An outlier, as defined by Hawkins [18], is "an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism." Similarly, Barnett and Lewis [3] describe it as "an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data." Outlier detection is the process of identifying such outliers, i.e., the data points which differ from the rest of the data. It is one of the most important and fundamental tasks in data mining and machine learning with applications in intrusion detection [20], fault detection [37], fraud detection [7] and others [11], [13], [27]. In recent years, many methods have been proposed to carry out the outlier detection task [1], [9], [10], [23], [42]. Despite the demonstration of promising results, further studies show that these results might be limited only to specific instances of the problem (e.g., a limited selection of datasets, a specific kind of outliers, etc.) [6].

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2508.12776

Country:

North America > United States > Wisconsin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Finland (0.04)
Europe > Denmark > Southern Denmark (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area (0.94)
Law Enforcement & Public Safety (0.68)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Rahal, Manal, Ahmed, Bestoun S., Renstrom, Roger, Stener, Robert, Wurtz, Albrecht

Data-Driven Heat Pump Management: Combining Machine Learning with Anomaly Detection for Residential Hot Water Systems

arXiv.org Artificial IntelligenceJun-23-2025

Heat pumps (HPs) have emerged as a cost-effective and clean technology for sustainable energy systems, but their efficiency in producing hot water remains restricted by conventional threshold-based control methods. Although machine learning (ML) has been successfully implemented for various HP applications, optimization of household hot water demand forecasting remains understudied. This paper addresses this problem by introducing a novel approach that combines predictive ML with anomaly detection to create adaptive hot water production strategies based on household-specific consumption patterns. Our key contributions include: (1) a composite approach combining ML and isolation forest (iForest) to forecast household demand for hot water and steer responsive HP operations; (2) multi-step feature selection with advanced time-series analysis to capture complex usage patterns; (3) application and tuning of three ML models: Light Gradient Boosting Machine (LightGBM), Long Short-Term Memory (LSTM), and Bi-directional LSTM with the self-attention mechanism on data from different types of real HP installations; and (4) experimental validation on six real household installations. Our experiments show that the best-performing model LightGBM achieves superior performance, with RMSE improvements of up to 9.37\% compared to LSTM variants with $R^2$ values between 0.748-0.983. For anomaly detection, our iForest implementation achieved an F1-score of 0.87 with a false alarm rate of only 5.2\%, demonstrating strong generalization capabilities across different household types and consumption patterns, making it suitable for real-world HP deployments.

data mining, household, machine learning, (17 more...)

2506.15719

Country:

Europe (0.28)
Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.93)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Direct Use of Geothermal Energy > Geothermal Heating, Ventilation, and Air Conditioning (HVAC) System (0.62)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningMay-20-2025

Theoretical Investigation on Inductive Bias of Isolation Forest

Zheng, Qin-Cheng, Zhang, Shao-Qun, Lyu, Shen-Huan, Jiang, Yuan, Zhou, Zhi-Hua

Isolation Forest (iForest) stands out as a widely-used unsupervised anomaly detector valued for its exceptional runtime efficiency and performance on large-scale tasks. Despite its widespread adoption, a theoretical foundation explaining iForest's success remains unclear. This paper theoretically investigates the conditions and extent of iForest's effectiveness by analyzing its inductive bias through the formulation of depth functions and growth processes. Since directly analyzing the depth function proves intractable due to iForest's random splitting mechanism, we model the growth process of iForest as a random walk, enabling us to derive the expected depth function using transition probabilities. Our case studies reveal key inductive biases: iForest exhibits lower sensitivity to central anomalies while demonstrating greater parameter adaptability compared to $k$-Nearest Neighbor anomaly detectors. Our study provides theoretical understanding of the effectiveness of iForest and establishes a foundation for further theoretical exploration.

anomaly, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2505.12825

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > Middle East > Yemen > Amran Governorate > Amran (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Monemizadeh, Vahideh, Kiani, Kourosh

Detecting Anomalies Using Rotated Isolation Forest

arXiv.org Artificial IntelligenceJan-29-2025

The Isolation Forest (iForest), proposed by Liu, Ting, and Zhou at TKDE 2012, has become a prominent tool for unsupervised anomaly detection. However, recent research by Hariri, Kind, and Brunner, published in TKDE 2021, has revealed issues with iForest. They identified the presence of axis-aligned ghost clusters that can be misidentified as normal clusters, leading to biased anomaly scores and inaccurate predictions. In response, they developed the Extended Isolation Forest (EIF), which effectively solves these issues by eliminating the ghost clusters introduced by iForest. This enhancement results in improved consistency of anomaly scores and superior performance. We reveal a previously overlooked problem in the Extended Isolation Forest (EIF), showing that it is vulnerable to ghost inter-clusters between normal clusters of data points. In this paper, we introduce the Rotated Isolation Forest (RIF) algorithm which effectively addresses both the axis-aligned ghost clusters observed in iForest and the ghost inter-clusters seen in EIF. RIF accomplishes this by randomly rotating the dataset (using random rotation matrices and QR decomposition) before feeding it into the iForest construction, thereby increasing dataset variation and eliminating ghost clusters. Our experiments conclusively demonstrate that the RIF algorithm outperforms iForest and EIF, as evidenced by the results obtained from both synthetic datasets and real-world datasets.

artificial intelligence, data mining, machine learning, (18 more...)

2501.17787

Country:

Europe > Switzerland (0.04)
Asia > Singapore (0.04)
North America > United States > New Jersey (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Networks (0.93)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.78)
(2 more...)