SDCOR: Scalable Density-based Clustering for Local Outlier Detection in Massive-Scale Datasets

Naghavi-Nozad, Sayyed-Ahmad, Haeri, Maryam Amir, Folino, Gianluigi

Oct-26-2020–arXiv.org Machine Learning

This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets. Differently from the well-known traditional algorithms, which assume that all the data is memory-resident, our proposed method is scalable and processes the input data chunk-by-chunk within the confines of a limited memory buffer. At the first phase, a temporary clustering model is built, then it is incrementally updated by analyzing consecutive memory-loads of points. Subsequently, at the end of scalable clustering, the approximate structure of original clusters is obtained. Finally, by another scan of the entire dataset and using a suitable criterion, an outlying score is assigned to each object, which is called SDCOR (Scalable Density-based Clustering Outlierness Ratio). Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time complexity and is more effective and efficient compared to best-known conventional density-based methods, which need to load all data into the memory; and also, to some fast distance-based methods, which can perform on data resident in the disk.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

Oct-26-2020

arXiv.org PDF

Add feedback

Country:
- Europe
  - Italy (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Germany > Rhineland-Palatinate
    - Kaiserslautern (0.04)
- Asia
  - India (0.04)
  - Middle East > Iran
    - Tehran Province > Tehran (0.04)

Genre:
- Research Report (0.81)
- Workflow (0.68)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning > Clustering (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found