Asymptotics for The $k$-means

Nov-17-2022–arXiv.org Artificial Intelligence

Clustering is one of the most important unsupervised learning techniques for understanding the underlying data structures. The goal is to partition a data set into many subsets, called clusters, such that the observations within the subsets are the most homogeneous and the observations between the subsets are the most heterogeneous. Clustering is usually carried out by specifying a similarity or dissimilarity measure between observations. Examples include the k-means [17, 19, 29, 37], the k-medians [3], the k-modes [5], and the generalized k-means [2, 31, 45], as well as many of their modifications [21, 24, 42]. Among those, the k-means has been considered as one of the most straightforward and popular methods since it was proposed sixty years ago [23, 36]. Although it is well known, the investigation of the theoretical properties is still far behind, leading to difficulties in developing more precise k-means methods in practice. The goal of the present research is to propose a new concept called clustering consistency for the asymptotics of the k-means with a resulting clustering method better than the existing k-means methods adopted by many software packages, including those adopted by R and Python.

artificial intelligence, k-means method, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Nov-17-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - California (0.04)
  - Pennsylvania > Philadelphia County
    - Philadelphia (0.04)
  - Nevada > Clark County
    - Las Vegas (0.04)
  - Indiana > Tippecanoe County
    - West Lafayette (0.04)
    - Lafayette (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Clustering (1.00)
  - Learning Graphical Models > Directed Networks
    - Bayesian Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found