Randomized PCA Forest for Outlier Detection

Rajabinasab, Muhammad, Pakdaman, Farhad, Gabbouj, Moncef, Schneider-Kamp, Peter, Zimek, Arthur

Aug-25-2025–arXiv.org Machine Learning

--We propose a novel unsupervised outlier detection method based on Randomized Principal Component Analysis (PCA). Inspired by the performance of Randomized PCA (RPCA) Forest in approximate K-Nearest Neighbor (KNN) search, we develop a novel unsupervised outlier detection method that utilizes RPCA Forest for outlier detection. Experimental results showcase the superiority of the proposed approach compared to the classical and state-of-the-art methods in performing the outlier detection task on several datasets while performing competitively on the rest. The extensive analysis of the proposed method reflects it high generalization power and its computational efficiency, highlighting it as a good choice for unsupervised outlier detection. An outlier, as defined by Hawkins [18], is "an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism." Similarly, Barnett and Lewis [3] describe it as "an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data." Outlier detection is the process of identifying such outliers, i.e., the data points which differ from the rest of the data. It is one of the most important and fundamental tasks in data mining and machine learning with applications in intrusion detection [20], fault detection [37], fraud detection [7] and others [11], [13], [27]. In recent years, many methods have been proposed to carry out the outlier detection task [1], [9], [10], [23], [42]. Despite the demonstration of promising results, further studies show that these results might be limited only to specific instances of the problem (e.g., a limited selection of datasets, a specific kind of outliers, etc.) [6].

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

Aug-25-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Wisconsin (0.04)
- Europe
  - Finland (0.04)
  - Denmark > Southern Denmark (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report
  - New Finding (0.93)
  - Experimental Study (0.67)

Industry:
- Health & Medicine > Therapeutic Area (0.94)
- Law Enforcement & Public Safety (0.68)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Anomaly Detection (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning > Nearest Neighbor Methods (0.69)
    - Performance Analysis > Accuracy (0.68)