Asymptotics for The $k$-means
–arXiv.org Artificial Intelligence
Clustering is one of the most important unsupervised learning techniques for understanding the underlying data structures. The goal is to partition a data set into many subsets, called clusters, such that the observations within the subsets are the most homogeneous and the observations between the subsets are the most heterogeneous. Clustering is usually carried out by specifying a similarity or dissimilarity measure between observations. Examples include the k-means [17, 19, 29, 37], the k-medians [3], the k-modes [5], and the generalized k-means [2, 31, 45], as well as many of their modifications [21, 24, 42]. Among those, the k-means has been considered as one of the most straightforward and popular methods since it was proposed sixty years ago [23, 36]. Although it is well known, the investigation of the theoretical properties is still far behind, leading to difficulties in developing more precise k-means methods in practice. The goal of the present research is to propose a new concept called clustering consistency for the asymptotics of the k-means with a resulting clustering method better than the existing k-means methods adopted by many software packages, including those adopted by R and Python.
arXiv.org Artificial Intelligence
Nov-17-2022
- Country:
- North America > United States
- New York (0.04)
- California (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- Indiana > Tippecanoe County
- West Lafayette (0.04)
- Lafayette (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- North America > United States
- Genre:
- Research Report > New Finding (0.46)