AITopics | outlier detection method

Collaborating Authors

outlier detection method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CapturingtheDenoisingEffectofPCAvia CompressionRatio

Neural Information Processing SystemsFeb-10-2026, 11:23:57 GMT

In this paper, we propose a novel metric calledcompression ratioto capture the effect of PCA on high-dimensional noisy data.

artificial intelligence, compression ratio, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

A method for outlier detection based on cluster analysis and visual expert criteria

Lara, Juan A., Lizcano, David, Rampérez, Víctor, Soriano, Javier

arXiv.org Artificial IntelligenceOct-28-2025

Outlier detection is an important problem occurring in a wide range of areas. Outliers are the outcome of fraudulent behaviour, mechanical faults, human error, or simply natural deviations. Many data mining applications perform outlier detection, often as a preliminary step in order to filter out outliers and build more representative models. In this paper, we propose an outlier detection method based on a clustering process. The aim behind the proposal outlined in this paper is to overcome the specificity of many existing outlier detection techniques that fail to take into account the inherent dispersion of domain objects. The outlier detection method is based on four criteria designed to represent how human beings (experts in each domain) visually identify outliers within a set of objects after analysing the clusters. This has an advantage over other clustering-based outlier detection techniques that are founded on a purely numerical analysis of clusters. Our proposal has been evaluated, with satisfactory results, on data (particularly time series) from two different domains: stabilometry, a branch of medicine studying balance-related functions in human beings and electroencephalography (EEG), a neurological exploration used to diagnose nervous system disorders. To validate the proposed method, we studied method outlier detection and efficiency in terms of runtime. The results of regression analyses confirm that our proposal is useful for detecting outlier data in different domains, with a false positive rate of less than 2% and a reliability greater than 99%.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1111/exsy.12473

2510.23136

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (0.87)
Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine (0.88)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Two stage GNSS outlier detection for factor graph optimization based GNSS-RTK/INS/odometer fusion

Song, Baoshan, Yan, Penggao, Xia, Xiao, Zhong, Yihan, Wen, Weisong, Hsu, Li-Ta

arXiv.org Artificial IntelligenceOct-2-2025

Reliable GNSS positioning in complex environments remains a critical challenge due to non-line-of-sight (NLOS) propagation, multipath effects, and frequent signal blockages. These effects can easily introduce large outliers into the raw pseudo-range measurements, which significantly degrade the performance of global navigation satellite system (GNSS) real-time kinematic (RTK) positioning and limit the effectiveness of tightly coupled GNSS-based integrated navigation system. To address this issue, we propose a two-stage outlier detection method and apply the method in a tightly coupled GNSS-RTK, inertial navigation system (INS), and odometer integration based on factor graph optimization (FGO). In the first stage, Doppler measurements are employed to detect pseudo-range outliers in a GNSS-only manner, since Doppler is less sensitive to multipath and NLOS effects compared with pseudo-range, making it a more stable reference for detecting sudden inconsistencies. In the second stage, pre-integrated inertial measurement units (IMU) and odometer constraints are used to generate predicted double-difference pseudo-range measurements, which enable a more refined identification and rejection of remaining outliers. By combining these two complementary stages, the system achieves improved robustness against both gross pseudo-range errors and degraded satellite measuring quality. The experimental results demonstrate that the two-stage detection framework significantly reduces the impact of pseudo-range outliers, and leads to improved positioning accuracy and consistency compared with representative baseline approaches. In the deep urban canyon test, the outlier mitigation method has limits the RMSE of GNSS-RTK/INS/odometer fusion from 0.52 m to 0.30 m, with 42.3% improvement.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2510.00524

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.64)

Add feedback

Robustness and accuracy of mean opinion scores with hard and soft outlier detection

Saupe, Dietmar, Bleile, Tim

arXiv.org Artificial IntelligenceSep-9-2025

In subjective assessment of image and video quality, observers rate or compare selected stimuli. Before calculating the mean opinion scores (MOS) for these stimuli from the ratings, it is recommended to identify and deal with outliers that may have given unreliable ratings. Several methods are available for this purpose, some of which have been standardized. These methods are typically based on statistics and sometimes tested by introducing synthetic ratings from artificial outliers, such as random clickers. However, a reliable and comprehensive approach is lacking for comparative performance analysis of outlier detection methods. To fill this gap, this work proposes and applies an empirical worst-case analysis as a general solution. Our method involves evolutionary optimization of an adversarial black-box attack on outlier detection algorithms, where the adversary maximizes the distortion of scale values with respect to ground truth. We apply our analysis to several hard and soft outlier detection methods for absolute category ratings and show their differing performance in this stress test. In addition, we propose two new outlier detection methods with low complexity and excellent worst-case performance. Software for adversarial attacks and data analysis is available.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.06554

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.38)
Government > Military (0.38)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Leveraging the Christoffel Function for Outlier Detection in Data Streams

Ducharlet, Kévin, Travé-Massuyès, Louise, Lasserre, Jean-Bernard, Lann, Marie-Véronique Le, Miloudi, Youssef

arXiv.org Artificial IntelligenceAug-26-2025

Outlier detection holds significant importance in the realm of data mining, particularly with the growing pervasiveness of data acquisition methods. The ability to identify outliers in data streams is essential for maintaining data quality and detecting faults. However, dealing with data streams presents challenges due to the non-stationary nature of distributions and the ever-increasing data volume. While numerous methods have been proposed to tackle this challenge, a common drawback is the lack of straightforward parameterization in many of them. This article introduces two novel methods: DyCF and DyCG. DyCF leverages the Christoffel function from the theory of approximation and orthogonal polynomials. Conversely, DyCG capitalizes on the growth properties of the Christoffel function, eliminating the need for tuning parameters. Both approaches are firmly rooted in a well-defined algebraic framework, meeting crucial demands for data stream processing, with a specific focus on addressing low-dimensional aspects and maintaining data history without memory cost. A comprehensive comparison between DyCF, DyCG, and state-of-the-art methods is presented, using both synthetic and real industrial data streams. The results show that DyCF outperforms fine-tuning methods, offering superior performance in terms of execution time and memory usage. DyCG performs less well, but has the considerable advantage of requiring no tuning at all.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s41060-024-00581-2

2508.16617

Country:

North America > United States (0.46)
Europe (0.46)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Randomized PCA Forest for Outlier Detection

Rajabinasab, Muhammad, Pakdaman, Farhad, Gabbouj, Moncef, Schneider-Kamp, Peter, Zimek, Arthur

arXiv.org Machine LearningAug-25-2025

--We propose a novel unsupervised outlier detection method based on Randomized Principal Component Analysis (PCA). Inspired by the performance of Randomized PCA (RPCA) Forest in approximate K-Nearest Neighbor (KNN) search, we develop a novel unsupervised outlier detection method that utilizes RPCA Forest for outlier detection. Experimental results showcase the superiority of the proposed approach compared to the classical and state-of-the-art methods in performing the outlier detection task on several datasets while performing competitively on the rest. The extensive analysis of the proposed method reflects it high generalization power and its computational efficiency, highlighting it as a good choice for unsupervised outlier detection. An outlier, as defined by Hawkins [18], is "an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism." Similarly, Barnett and Lewis [3] describe it as "an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data." Outlier detection is the process of identifying such outliers, i.e., the data points which differ from the rest of the data. It is one of the most important and fundamental tasks in data mining and machine learning with applications in intrusion detection [20], fault detection [37], fraud detection [7] and others [11], [13], [27]. In recent years, many methods have been proposed to carry out the outlier detection task [1], [9], [10], [23], [42]. Despite the demonstration of promising results, further studies show that these results might be limited only to specific instances of the problem (e.g., a limited selection of datasets, a specific kind of outliers, etc.) [6].

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2508.12776

Country:

North America > United States > Wisconsin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Finland (0.04)
Europe > Denmark > Southern Denmark (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area (0.94)
Law Enforcement & Public Safety (0.68)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

An empirical comparison of some outlier detection methods with longitudinal data

D'Orazio, Marcello

arXiv.org Artificial IntelligenceJul-30-2025

This note investigates the problem of detecting outliers in longitudinal data. It compares well-known methods used in official statistics with proposals from the fields of data mining and machine learning that are based on the distance between observations or binary partitioning trees. This is achieved by applying the methods to panel survey data related to different types of statistical units. Traditional methods are quite simple, enabling the direct identification of potential outliers, but they require specific assumptions. In contrast, recent methods provide only a score whose magnitude is directly related to the likelihood of an outlier being present. All the methods require the user to set a number of tuning parameters. However, the most recent methods are more flexible and sometimes more effective than traditional methods. In addition, these methods can be applied to multidimensional data.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2507.21203

Country: Europe > Austria (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.68)

Add feedback

Kernel Outlier Detection

Dağıdır, Can Hakan, Hubert, Mia, Rousseeuw, Peter J.

arXiv.org Machine LearningJul-1-2025

A new anomaly detection method called kernel outlier detection (KOD) is proposed. It is designed to address challenges of outlier detection in high-dimensional settings. The aim is to overcome limitations of existing methods, such as dependence on distributional assumptions or on hyperparameters that are hard to tune. KOD starts with a kernel transformation, followed by a projection pursuit approach. Its novelties include a new ensemble of directions to search over, and a new way to combine results of different direction types. This provides a flexible and lightweight approach for outlier detection. Our empirical evaluations illustrate the effectiveness of KOD on three small datasets with challenging structures, and on four large benchmark datasets.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.52933/jdssv.v5i8.152

2506.22994

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
Asia (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Detecting Backdoor Samples in Contrastive Language Image Pretraining

Huang, Hanxun, Erfani, Sarah, Li, Yige, Ma, Xingjun, Bailey, James

arXiv.org Artificial IntelligenceFeb-10-2025

Contrastive language-image pretraining (CLIP) has been found to be vulnerable to poisoning backdoor attacks where the adversary can achieve an almost perfect attack success rate on CLIP models by poisoning only 0.01\% of the training dataset. This raises security concerns on the current practice of pretraining large-scale models on unscrutinized web data using CLIP. In this work, we analyze the representations of backdoor-poisoned samples learned by CLIP models and find that they exhibit unique characteristics in their local subspace, i.e., their local neighborhoods are far more sparse than that of clean samples. Based on this finding, we conduct a systematic study on detecting CLIP backdoor attacks and show that these attacks can be easily and efficiently detected by traditional density ratio-based local outlier detectors, whereas existing backdoor sample detection methods fail. Our experiments also reveal that an unintentional backdoor already exists in the original CC3M dataset and has been trained into a popular open-source model released by OpenCLIP. Based on our detector, one can clean up a million-scale web dataset (e.g., CC3M) efficiently within 15 minutes using 4 Nvidia A100 GPUs. The code is publicly available in our \href{https://github.com/HanxunH/Detect-CLIP-Backdoor-Samples}{GitHub repository}.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.01385

Country:

Asia > China (0.04)
Asia > Singapore (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Asia > Nepal (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Filters

Collaborating Authors

outlier detection method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

CapturingtheDenoisingEffectofPCAvia CompressionRatio

A method for outlier detection based on cluster analysis and visual expert criteria

2e163450c1ae3167832971e6da29f38d-Paper-Conference.pdf

Two stage GNSS outlier detection for factor graph optimization based GNSS-RTK/INS/odometer fusion

Robustness and accuracy of mean opinion scores with hard and soft outlier detection

Leveraging the Christoffel Function for Outlier Detection in Data Streams

Randomized PCA Forest for Outlier Detection

An empirical comparison of some outlier detection methods with longitudinal data

Kernel Outlier Detection

Detecting Backdoor Samples in Contrastive Language Image Pretraining