Fast and explainable clustering based on sorting

Feb-3-2022–arXiv.org Machine Learning

We introduce a fast and explainable clustering method called CLASSIX. It consists of two phases, namely a greedy aggregation phase of the sorted data into groups of nearby data points, followed by the merging of groups into clusters. The algorithm is controlled by two scalar parameters, namely a distance parameter for the aggregation and another parameter controlling the minimal cluster size. Extensive experiments are conducted to give a comprehensive evaluation of the clustering performance on synthetic and real-world datasets, with various cluster shapes and low to high feature dimensionality. Our experiments demonstrate that CLASSIX competes with state-of-the-art clustering algorithms. The algorithm has linear space complexity and achieves near linear time complexity on a wide range of problems. Its inherent simplicity allows for the generation of intuitive explanations of the computed clusters.

algorithm, classix, dbscan, (16 more...)

arXiv.org Machine Learning

Feb-3-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Missouri (0.04)
- Europe
  - United Kingdom > England
    - Greater Manchester > Manchester (0.04)
  - Netherlands > North Brabant
    - Eindhoven (0.04)

Genre:
- Research Report (0.81)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning > Clustering (1.00)