AITopics | outlier detection

Collaborating Authors

outlier detection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

What Drives the Inlier-Memorization Effect? A Theory of Outlier Detection via Early Training Dynamics

Kim, Kunwoong, Kim, Dongha

arXiv.org Machine LearningJun-30-2026

Outlier detection (OD) aims to identify anomalous instances by learning the underlying structure of normal data (inliers), and is particularly challenging in fully unsupervised settings where no information about anomalies is available during training. Recent advances have leveraged the inlier-memorization (IM) effect, a phenomenon in which deep models memorize inlier patterns earlier than those of outliers, as a powerful signal for distinguishing outliers. However, despite its empirical success, the theoretical understanding of the IM effect remains limited. In this work, we present a theoretical study of the IM effect. Focusing on a simple autoencoder, we show that, under mild assumptions, the model can successfully memorize inliers while failing to memorize outliers during certain stages of early training. In particular, we characterize not only the emergence of the IM effect, but also its strength and persistence, and analyze how these properties depend on the data distribution and parameter initialization. In addition, building on these insights, we derive simple yet practical guidelines for enhancing the IM effect, including data preprocessing and parameter initialization schemes, achieving state-of-the-art performance on the ADBench datasets. Our findings provide a theoretical foundation for the IM effect and offer actionable directions for improving IM-based outlier detection methods.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2606.29791

Country:

Europe (0.92)
North America > United States > California (0.27)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Probabilistic data quality assessment for structural monitoring data via outlier-resistant conditional diffusion model

Li, Qi, Huang, Yong, Li, Hui

arXiv.org Machine LearningApr-30-2026

Data quality assessment is an essential step that ensures the reliability of the subsequent structural health monitoring (SHM) tasks. This study proposes a prediction deviation-based SHM data quality assessment method using a univariate implicit auto-regressive model, enabling outlier diagnosis and data cleaning. The proposed conditional diffusion model (CDM) augments the standard diffusion model with a conditional embedding module to incorporate temporal context, quartile normalization to mitigate distribution skew, and a Huber loss to enhance robustness against outliers. Within this univariate implicit autoregressive framework, each data point is assigned an outlier probability, quantifying its degree of "outlier-ness", and a global quality evaluation score is computed to characterize the overall dataset quality. Extensive case studies utilizing operational data from real-world structures demonstrate that the proposed framework significantly improves the accuracy of data quality assessment, outperforming other strong baselines representative of clustering, isolation-based, and deep reconstruction methods. The effectiveness and robustness of the proposed framework are further demonstrated by the findings of ablation experiments and hyperparameter analysis.

artificial intelligence, data quality, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1016/j.eswa.2026.132181

2604.26366

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Transportation > Ground > Rail (0.69)
Energy (0.67)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Automatic Unsupervised Outlier Model Selection

Neural Information Processing SystemsApr-25-2026, 03:11:56 GMT

Given an unsupervised outlier detection task on a new dataset, how can we automatically select a good outlier detection algorithm and its hyperparameter(s) (collectively called a model)? In this work, we tackle the unsupervised outlier model selection (UOMS) problem, and propose METAOD, a principled, data-driven approach to UOMS based on meta-learning. The UOMS problem is notoriously challenging, as compared to model selection for classification and clustering, since (i) model evaluation is infeasible due to the lack of hold-out data with labels, and (ii) model comparison is infeasible due to the lack of a universal objective function. METAOD capitalizes on the performances of a large body of detection models on historical outlier detection benchmark datasets, and carries over this prior experience to automatically select an effective model to be employed on a new dataset without any labels, model evaluations or model comparisons. To capture task similarity within our meta-learning framework, we introduce specialized metafeatures that quantify outlying characteristics of a dataset. Extensive experiments show that selecting a model by METAOD significantly outperforms no model selection (e.g.

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)

Genre:

Research Report (0.68)
Instructional Material > Course Syllabus & Notes (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

RFX-Fuse: Breiman and Cutler's Unified ML Engine + Native Explainable Similarity

Kuchar, Chris

arXiv.org Machine LearningMar-17-2026

Breiman and Cutler's original Random Forest was designed as a unified ML engine -- not merely an ensemble predictor. Their implementation included classification, regression, unsupervised learning, proximity-based similarity, outlier detection, missing value imputation, and visualization -- capabilities that modern libraries like scikit-learn never implemented. RFX-Fuse (Random Forests X [X=compression] -- Forest Unified Learning and Similarity Engine) delivers Breiman and Cutler's complete vision with native GPU/CPU support. Modern ML pipelines require 5+ separate tools -- XGBoost for prediction, FAISS for similarity, SHAP for explanations, Isolation Forest for outliers, custom code for importance. RFX-Fuse provides a 1 to 2 model object alternative -- a single set of trees grown once. Novel Contributions: (1) Proximity Importance -- native explainable similarity: proximity measures that samples are similar; proximity importance explains why. (2) Dataset-specific imputation validation for general tabular data -- ranking imputation methods by how real the imputed data looks, without ground truth labels.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2603.13234

Country: North America > United States > Utah (0.04)

Genre: Research Report (0.82)

Industry: Banking & Finance (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.95)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.93)

Add feedback

A Practical Algorithm for Distributed Clustering and Outlier Detection

Jiecao Chen, Erfan Sadeqi Azer, Qin Zhang

Neural Information Processing SystemsFeb-15-2026, 08:08:09 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, dataset, outlier, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Indiana > Monroe County > Bloomington (0.05)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection

Xiaoyi Gu, Leman Akoglu, Alessandro Rinaldo

Neural Information Processing SystemsFeb-12-2026, 18:05:49 GMT

In this paper we are concerned with investigating theperformance ofNN-based methods foranomaly detection. We firstshowthrough extensivesimulations thatNNmethods compare favorably to some of the other state-of-the-art algorithms for anomaly detection based on a setofbenchmark syntheticdatasets.

data mining, detection, machine learning, (16 more...)

Neural Information Processing Systems

Country: