Level Sets or Gradient Lines? A Unifying View of Modal Clustering

Sep-17-2021–arXiv.org Machine Learning

Up until the 1970's there were two main ways of clustering points in space. One of them, perhaps pioneered by Pearson [44], was to fit a (usually Gaussian) mixture to the data, and that being done, classify each data point -- as well as any other point available at a later date -- according to the most likely component in the mixture. The other one was based on a direct partitioning of the space, most notably by minimization of the average minimum squared distance to a center: the K-means problem, whose computational difficulty led to a number of famous algorithms [22, 31, 36, 37, 39] and likely played a role in motivating the development of hierarchical clustering [21, 25, 54, 63]. In the 1970's, two decidedly nonparametric approaches to clustering were proposed, both based on the topography given by the population density. Of course, in practice, the density is estimated, often by some form of kernel density estimation.

cluster tree, gradient flow, gradient line, (16 more...)

arXiv.org Machine Learning

Sep-17-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Virginia > Fairfax County
    - Fairfax (0.04)
  - California > San Diego County
    - San Diego (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Statistical Learning
    - Clustering (1.00)