Goto

Collaborating Authors

 vertex cover











Learning Augmented Graph $k$-Clustering

Fan, Chenglin, Shin, Kijun

arXiv.org Artificial Intelligence

Clustering is a cornerstone of unsupervised machine learning, widely applied in fields such as data organization, anomaly detection, and community detection in networks [Xu and Wunsch, 2005]. Among clustering problems, the k -means and k -median problems stand out as fundamental due to their simplicity and effectiveness. Traditional algorithms aim to partition data into k clusters, minimizing either the sum of squared distances (k-means) or the sum of absolute distances (k-median) to their respective cluster centers. The k -means algorithm has been a cornerstone of clustering research for decades, tracing its roots to foundational works by [MacQueen, 1967] and [Lloyd, 1982], who introduced the iterative optimization approach still used today. Extensions by [Hartigan and Wong, 1979] improved convergence, while [Forgy, 1965] proposed widely-used initialization techniques. The optimization principles underlying k -means were influenced by earlier algorithmic developments, such as Floyd's contributions to optimization [Floyd, 1962]. Improvements include k -means++ [Arthur and Vassilvitskii, 2007], which introduced a probabilistic seeding strategy to improve initialization quality and convergence, and Mini-Batch k -means[Sculley, 2010], which enabled clustering on massive datasets with reduced computational overhead.