Density based Spatial Clustering of Lines via Probabilistic Generation of Neighbourhood

Oct-3-2024–arXiv.org Artificial Intelligence

Density based spatial clustering of points in $\mathbb{R}^n$ has a myriad of applications in a variety of industries. We generalise this problem to the density based clustering of lines in high-dimensional spaces, keeping in mind there exists no valid distance measure that follows the triangle inequality for lines. In this paper, we design a clustering algorithm that generates a customised neighbourhood for a line of a fixed volume (given as a parameter), based on an optional parameter as a continuous probability density function. This algorithm is not sensitive to the outliers and can effectively identify the noise in the data using a cardinality parameter. One of the pivotal applications of this algorithm is clustering data points in $\mathbb{R}^n$ with missing entries, while utilising the domain knowledge of the respective data. In particular, the proposed algorithm is able to cluster $n$-dimensional data points that contain at least $(n-1)$-dimensional information. We illustrate the neighbourhoods for the standard probability distributions with continuous probability density functions and demonstrate the effectiveness of our algorithm on various synthetic and real-world datasets (e.g., rail and road networks). The experimental results also highlight its application in clustering incomplete data.

dataset, line segment, neighbourhood, (12 more...)

arXiv.org Artificial Intelligence

Oct-3-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.29)
- Asia
  - Middle East > Israel
    - Haifa District > Haifa (0.04)
  - India > West Bengal
    - Kolkata (0.04)

Genre:
- Research Report (0.64)

Industry:
- Transportation > Ground (0.49)

Technology:
- Information Technology
  - Data Science (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning > Clustering (1.00)