Asymptotics for The $k$-means
–arXiv.org Artificial Intelligence
Clustering is one of the most important unsupervised learning techniques for understanding the underlying data structures. The goal is to partition a data set into many subsets, called clusters, such that the observations within the subsets are the most homogeneous and the observations between the subsets are the most heterogeneous. Clustering is usually carried out by specifying a similarity or dissimilarity measure between observations. Examples include the k-means [17, 19, 29, 37], the k-medians [3], the k-modes [5], and the generalized k-means [2, 31, 45], as well as many of their modifications [21, 24, 42]. Among those, the k-means has been considered as one of the most straightforward and popular methods since it was proposed sixty years ago [23, 36]. Although it is well known, the investigation of the theoretical properties is still far behind, leading to difficulties in developing more precise k-means methods in practice. The goal of the present research is to propose a new concept called clustering consistency for the asymptotics of the k-means with a resulting clustering method better than the existing k-means methods adopted by many software packages, including those adopted by R and Python.
arXiv.org Artificial Intelligence
Nov-17-2022
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- North America > United States
- California (0.04)
- Indiana > Tippecanoe County
- Lafayette (0.04)
- West Lafayette (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- New York (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Europe > United Kingdom
- Genre:
- Research Report > New Finding (0.46)