AITopics | isolation forest

Collaborating Authors

isolation forest

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RFX-Fuse: Breiman and Cutler's Unified ML Engine + Native Explainable Similarity

Kuchar, Chris

arXiv.org Machine LearningMar-17-2026

Breiman and Cutler's original Random Forest was designed as a unified ML engine -- not merely an ensemble predictor. Their implementation included classification, regression, unsupervised learning, proximity-based similarity, outlier detection, missing value imputation, and visualization -- capabilities that modern libraries like scikit-learn never implemented. RFX-Fuse (Random Forests X [X=compression] -- Forest Unified Learning and Similarity Engine) delivers Breiman and Cutler's complete vision with native GPU/CPU support. Modern ML pipelines require 5+ separate tools -- XGBoost for prediction, FAISS for similarity, SHAP for explanations, Isolation Forest for outliers, custom code for importance. RFX-Fuse provides a 1 to 2 model object alternative -- a single set of trees grown once. Novel Contributions: (1) Proximity Importance -- native explainable similarity: proximity measures that samples are similar; proximity importance explains why. (2) Dataset-specific imputation validation for general tabular data -- ranking imputation methods by how real the imputed data looks, without ground truth labels.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2603.13234

Country: North America > United States > Utah (0.04)

Genre: Research Report (0.82)

Industry: Banking & Finance (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.95)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.93)

Add feedback

A Hybrid Deep Learning and Anomaly Detection Framework for Real-Time Malicious URL Classification

Khaled, Berkani, Rafik, Zeraoulia

arXiv.org Artificial IntelligenceDec-4-2025

The number and sophistication of cyberthreats have increased along with the internet's exponential expansion, especially those that are spread by bad URLs. A variety of assaults, such as phishing, drive-by downloads, command-and-control communications, and data exfiltration, are launched using malicious websites. Because attackers are constantly changing URLs to avoid detection, traditional blacklisting techniques are unable to keep up with the dynamic and hostile character of contemporary threats. As a result, intelligent algorithms that can recognize intricate patterns in URLs and instantly identify malicious ones have become crucial components of contemporary cybersecurity protection designs [1, 13]. Because machine learning (ML) and deep learning (DL) approaches can identify non-linear relationships in input data and generalize from observed patterns, they have shown considerable promise in the field of malicious URL detection [2, 3]. But there are still a number of obstacles to overcome: class imbalance (lack of labeled malicious data compared to benign URLs); attackers' adversarial techniques that produce highly obfuscated or anomalous URLs that undermine the effectiveness of traditional classifiers; and the majority of detection systems are restricted to monolingual user interfaces and lack real-time usability features.

data mining, detection, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2512.03462

Genre: Research Report (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AuditCopilot: Leveraging LLMs for Fraud Detection in Double-Entry Bookkeeping

Kadir, Md Abdul, Vasu, Sai Suresh Macharla, Nair, Sidharth S., Sonntag, Daniel

arXiv.org Artificial IntelligenceDec-3-2025

Auditors rely on Journal Entry Tests (JETs) to detect anomalies in tax-related ledger records, but rule-based methods generate overwhelming false positives and struggle with subtle irregularities. We investigate whether large language models (LLMs) can serve as anomaly detectors in double-entry bookkeeping. Benchmarking SoTA LLMs such as LLaMA and Gemma on both synthetic and real-world anonymized ledgers, we compare them against JETs and machine learning baselines. Our results show that LLMs consistently outperform traditional rule-based JETs and classical ML baselines, while also providing natural-language explanations that enhance interpretability. These results highlight the potential of \textbf{AI-augmented auditing}, where human auditors collaborate with foundation models to strengthen financial integrity.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.02726

Country: Europe > Germany > Saarland (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Law (0.47)
Banking & Finance (0.47)
Law Enforcement & Public Safety > Fraud (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Unsupervised Anomaly Detection for Smart IoT Devices: Performance and Resource Comparison

Sami, Md. Sad Abdullah, Abid, Mushfiquzzaman

arXiv.org Artificial IntelligenceDec-1-2025

The rapid expansion of Internet of Things (IoT) deployments across diverse sectors has significantly enhanced operational efficiency, yet concurrently elevated cybersecurity vulnerabilities due to increased exposure to cyber threats. Given the limitations of traditional signature-based Anomaly Detection Systems (ADS) in identifying emerging and zero-day threats, this study investigates the effectiveness of two unsupervised anomaly detection techniques, Isolation Forest (IF) and One-Class Support Vector Machine (OC-SVM), using the TON_IoT thermostat dataset. A comprehensive evaluation was performed based on standard metrics (accuracy, precision, recall, and F1-score) alongside critical resource utilization metrics such as inference time, model size, and peak RAM usage. Experimental results revealed that IF consistently outperformed OC-SVM, achieving higher detection accuracy, superior precision, and recall, along with a significantly better F1-score. Furthermore, Isolation Forest demonstrated a markedly superior computational footprint, making it more suitable for deployment on resource-constrained IoT edge devices. These findings underscore Isolation Forest's robustness in high-dimensional and imbalanced IoT environments and highlight its practical viability for real-time anomaly detection.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.21842

Country:

Asia > Bangladesh (0.15)
Europe > Portugal (0.14)
Asia > Indonesia (0.14)

Genre: Research Report > New Finding (0.89)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.35)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Knowledge-based anomaly detection for identifying network-induced shape artifacts

Deshpande, Rucha, Rahman, Tahsin, Lago, Miguel, Subbaswamy, Adarsh, Delfino, Jana G., Zamzmi, Ghada, Thompson, Elim, Badano, Aldo, Kahaki, Seyed

arXiv.org Artificial IntelligenceNov-10-2025

Synthetic data provides a promising approach to address data scarcity for training machine learning models; however, adoption without proper quality assessments may introduce artifacts, distortions, and unrealistic features that compromise model performance and clinical utility. This work introduces a novel knowledge-based anomaly detection method for detecting network-induced shape artifacts in synthetic images. The introduced method utilizes a two-stage framework comprising (i) a novel feature extractor that constructs a specialized feature space by analyzing the per-image distribution of angle gradients along anatomical boundaries, and (ii) an isolation forest-based anomaly detector. We demonstrate the effectiveness of the method for identifying network-induced shape artifacts in two synthetic mammography datasets from models trained on CSAW-M and VinDr-Mammo patient datasets respectively. Quantitative evaluation shows that the method successfully concentrates artifacts in the most anomalous partition (1st percentile), with AUC values of 0.97 (CSAW-syn) and 0.91 (VMLO-syn). In addition, a reader study involving three imaging scientists confirmed that images identified by the method as containing network-induced shape artifacts were also flagged by human readers with mean agreement rates of 66% (CSAW-syn) and 68% (VMLO-syn) for the most anomalous partition, approximately 1.5-2 times higher than the least anomalous partition. Kendall-Tau correlations between algorithmic and human rankings were 0.45 and 0.43 for the two datasets, indicating reasonable agreement despite the challenging nature of subtle artifact detection. This method is a step forward in the responsible use of synthetic data, as it allows developers to evaluate synthetic images for known anatomic constraints and pinpoint and address specific issues to improve the overall quality of a synthetic dataset.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.04729

Country: North America > United States (1.00)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Health & Medicine > Nuclear Medicine (0.93)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

SHIELD: Securing Healthcare IoT with Efficient Machine Learning Techniques for Anomaly Detection

Desai, Mahek, Rumale, Apoorva, Asadinia, Marjan

arXiv.org Artificial IntelligenceNov-6-2025

The integration of IoT devices in healthcare introduces significant security and reliability challenges, increasing susceptibility to cyber threats and operational anomalies. This study proposes a machine learning-driven framework for (1) detecting malicious cyberattacks and (2) identifying faulty device anomalies, leveraging a dataset of 200,000 records. Eight machine learning models are evaluated across three learning approaches: supervised learning (XGBoost, K-Nearest Neighbors (K- NN)), semi-supervised learning (Generative Adversarial Networks (GAN), Variational Autoencoders (VAE)), and unsupervised learning (One-Class Support Vector Machine (SVM), Isolation Forest, Graph Neural Networks (GNN), and Long Short-Term Memory (LSTM) Autoencoders). The comprehensive evaluation was conducted across multiple metrics like F1-score, precision, recall, accuracy, ROC-AUC, computational efficiency. XGBoost achieved 99\% accuracy with minimal computational overhead (0.04s) for anomaly detection, while Isolation Forest balanced precision and recall effectively. LSTM Autoencoders underperformed with lower accuracy and higher latency. For attack detection, KNN achieved near-perfect precision, recall, and F1-score with the lowest computational cost (0.05s), followed by VAE at 97% accuracy. GAN showed the highest computational cost with lowest accuracy and ROC-AUC. These findings enhance IoT-enabled healthcare security through effective anomaly detection strategies. By improving early detection of cyber threats and device failures, this framework has the potential to prevent data breaches, minimize system downtime, and ensure the continuous and safe operation of medical devices, ultimately safeguarding patient health and trust in IoT-driven healthcare solutions.

data mining, detection, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/AIIoT65859.2025.11105287

2511.03661

Country: North America > United States (0.15)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Military > Cyberwarfare (0.56)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SERVIMON: AI-Driven Predictive Maintenance and Real-Time Monitoring for Astronomical Observatories

Mastriani, Emilio, Costa, Alessandro, Incardona, Federico, Munari, Kevin, Spinello, Sebastiano

arXiv.org Artificial IntelligenceNov-3-2025

Objective: ServiMon is designed to offer a scalable and intelligent pipeline for data collection and auditing to monitor distributed astronomical systems such as the ASTRI Mini-Array. The system enhances quality control, predictive maintenance, and real-time anomaly detection for telescope operations. Methods: ServiMon integrates cloud-native technologies-including Prometheus, Grafana, Cassandra, Kafka, and InfluxDB-for telemetry collection and processing. It employs machine learning algorithms, notably Isolation Forest, to detect anomalies in Cassandra performance metrics. Key indicators such as read/write latency, throughput, and memory usage are continuously monitored, stored as time-series data, and preprocessed for feature engineering. Anomalies detected by the model are logged in InfluxDB v2 and accessed via Flux for real-time monitoring and visualization. Results: AI-based anomaly detection increases system resilience by identifying performance degradation at an early stage, minimizing downtime, and optimizing telescope operations. Additionally, ServiMon supports astrostatistical analysis by correlating telemetry with observational data, thus enhancing scientific data quality. AI-generated alerts also improve real-time monitoring, enabling proactive system management. Conclusion: ServiMon's scalable framework proves effective for predictive maintenance and real-time monitoring of astronomical infrastructures. By leveraging cloud and edge computing, it is adaptable to future large-scale experiments, optimizing both performance and cost. The combination of machine learning and big data analytics makes ServiMon a robust and flexible solution for modern and next-generation observational astronomy.

data mining, machine learning, real time system, (13 more...)

arXiv.org Artificial Intelligence

2510.27146

Country: Europe > Italy (0.15)

Genre: Research Report (0.51)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

DEEDEE: Fast and Scalable Out-of-Distribution Dynamics Detection

Aljaafari, Tala, Kanade, Varun, Torr, Philip, de Witt, Christian Schroeder

arXiv.org Artificial IntelligenceOct-27-2025

Deploying reinforcement learning (RL) in safety-critical settings is constrained by brittleness under distribution shift. We study out-of-distribution (OOD) detection for RL time series and introduce DEEDEE, a two-statistic detector that revisits representation-heavy pipelines with a minimal alternative. DEEDEE uses only an episodewise mean and an RBF kernel similarity to a training summary, capturing complementary global and local deviations. Despite its simplicity, DEEDEE matches or surpasses contemporary detectors across standard RL OOD suites, delivering a 600-fold reduction in compute (FLOPs / wall-time) and an average 5% absolute accuracy gain over strong baselines. Conceptually, our results indicate that diverse anomaly types often imprint on RL trajectories through a small set of low-order statistics, suggesting a compact foundation for OOD detection in complex environments.

deedee, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2510.21638

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MicroRCA-Agent: Microservice Root Cause Analysis Method Based on Large Language Model Agents

Tang, Pan, Tang, Shixiang, Pu, Huanqi, Miao, Zhiqing, Wang, Zhixing

arXiv.org Artificial IntelligenceSep-22-2025

This paper presents MicroRCA-Agent, an innovative solution for microservice root cause analysis based on large language model agents, which constructs an intelligent fault root cause localization system with multimodal data fusion. The technical innovations are embodied in three key aspects: First, we combine the pre-trained Drain log parsing algorithm with multi-level data filtering mechanism to efficiently compress massive logs into high-quality fault features. Second, we employ a dual anomaly detection approach that integrates Isolation Forest unsupervised learning algorithms with status code validation to achieve comprehensive trace anomaly identification. Third, we design a statistical symmetry ratio filtering mechanism coupled with a two-stage LLM analysis strategy to enable full-stack phenomenon summarization across node-service-pod hierarchies. The multimodal root cause analysis module leverages carefully designed cross-modal prompts to deeply integrate multimodal anomaly information, fully exploiting the cross-modal understanding and logical reasoning capabilities of large language models to generate structured analysis results encompassing fault components, root cause descriptions, and reasoning trace. Comprehensive ablation studies validate the complementary value of each modal data and the effectiveness of the system architecture. The proposed solution demonstrates superior performance in complex microservice fault scenarios, achieving a final score of 50.71. The code has been released at: https://github.com/tangpan360/MicroRCA-Agent.

data mining, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.15635

Country: Asia > China (0.29)

Genre: Research Report (1.00)

Industry: Energy (0.47)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Federated Isolation Forest for Efficient Anomaly Detection on Edge IoT Systems

Vasiljevic, Pavle, Matic, Milica, Popovic, Miroslav

arXiv.org Artificial IntelligenceSep-5-2025

This post - print is the paper version that was submitted to ZINC 202 5 . Abstract -- Recently, federated learning frameworks such as Python TestBed for Federated Learning Algorithms and MicroPython TestBed for Federated Learning Algorithms have emerged to tackle user privacy concerns and efficiency in embedded systems. Even more recently, an efficient federated anomaly detection algorithm, FLiForest, based on Isolation Forests has been developed, offering a low - resource, unsupervised method well - suited for edge deployment and continuous learning. In this paper, we present an appli cation of Isolation Forest - based temperature anomaly detection, developed using the previously mentioned federated learning frameworks, aimed at small edge devices and IoT systems running MicroPython. The system has been experimentally evaluated, achieving over 9 6 % accuracy in distinguishing normal from abnormal readings and above 78 % precision in detecting anomalies across all tested configurations, while maintaining a memory usage below 16 0 KB during model training.

artificial intelligence, data mining, machine learning, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ZINC65316.2025.11103552

2506.05138

Country: Europe > Serbia (0.30)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Smart Houses & Appliances (0.84)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback