AITopics | k-means algorithm

Collaborating Authors

k-means algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Label consistency in overfitted generalized k-means

Neural Information Processing SystemsApr-25-2026, 15:13:49 GMT

We provide theoretical guarantees for label consistency in generalized k-means problems, with an emphasis on the overfitted case where the number of clusters used by the algorithm is more than the ground truth. We provide conditions under which the estimated labels are close to a refinement of the true cluster labels. We consider both exact and approximate recovery of the labels. Our results hold for any constant-factor approximation to the k-means problem. The results are also model-free and only based on bounds on the maximum or average distance of the data points to the true cluster centers. These centers themselves are loosely defined and can be taken to be any set of points for which the aforementioned distances can be controlled. We show the usefulness of the results with applications to some manifold clustering problems.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

46f76a4bda9a9579eab38a8f6eabcda1-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-12-2026, 01:46:08 GMT

For`, we construct the systems of hyperrectangles by4 first precomputing anapproximate k-NN distance estimate using Ball Trees foreachdata point, andthen clustering5 the top-q densest data points intoT partitions using the k-means algorithm, where we binary search for the optimal6 parameterq.

algorithm, artificial intelligence, machine learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.98)

Add feedback

735ddec196a9ca5745c05bec0eaa4bf9-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 22:29:17 GMT

algorithm, informative feature, selection, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

q-means: A quantum algorithm for unsupervised machine learning

Neural Information Processing SystemsDec-25-2025, 01:42:11 GMT

Quantum information is a promising new paradigm for fast computations that can provide substantial speedups for many algorithms we use today. Among them, quantum machine learning is one of the most exciting applications of quantum computers. In this paper, we introduce q-means, a new quantum algorithm for clustering. It is a quantum version of a robust k-means algorithm, with similar convergence and precision guarantees. We also design a method to pick the initial centroids equivalent to the classical k-means++ method. Our algorithm provides currently an exponential speedup in the number of points of the dataset, compared to the classical k-means algorithm. We also detail the running time of q-means when applied to well-clusterable datasets. We provide a detailed runtime analysis and numerical simulations for specific datasets. Along with the algorithm, the theorems and tools introduced in this paper can be reused for various applications in quantum machine learning.

algorithm, quantum algorithm, unsupervised machine, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A novel k-means clustering approach using two distance measures for Gaussian data

Gada, Naitik

arXiv.org Machine LearningNov-25-2025

Clustering algorithms have long been the topic of research, representing the more popular side of unsupervised learning. Since clustering analysis is one of the best ways to find some clarity and structure within raw data, this paper explores a novel approach to \textit{k}-means clustering. Here we present a \textit{k}-means clustering algorithm that takes both the within cluster distance (WCD) and the inter cluster distance (ICD) as the distance metric to cluster the data into \emph{k} clusters pre-determined by the Calinski-Harabasz criterion in order to provide a more robust output for the clustering analysis. The idea with this approach is that by including both the measurement metrics, the convergence of the data into their clusters becomes solidified and more robust. We run the algorithm with some synthetically produced data and also some benchmark data sets obtained from the UCI repository. The results show that the convergence of the data into their respective clusters is more accurate by using both WCD and ICD measurement metrics. The algorithm is also better at clustering the outliers into their true clusters as opposed to the traditional \textit{k} means method. We also address some interesting possible research topics that reveal themselves as we answer the questions we initially set out to address.

algorithm, cluster center, variance, (13 more...)

arXiv.org Machine Learning

2511.17823

Country:

North America > United States > Wisconsin (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A Geometric Approach to $k$-means

Hong, Jiazhen, Qian, Wei, Chen, Yudong, Zhang, Yuqian

arXiv.org Artificial IntelligenceOct-24-2025

\kmeans clustering is a fundamental problem in many scientific and engineering domains. The optimization problem associated with \kmeans clustering is nonconvex, for which standard algorithms are only guaranteed to find a local optimum. Leveraging the hidden structure of local solutions, we propose a general algorithmic framework for escaping undesirable local solutions and recovering the global solution or the ground truth clustering. This framework consists of iteratively alternating between two steps: (i) detect mis-specified clusters in a local solution, and (ii) improve the local solution by non-local operations. We discuss specific implementation of these steps, and elucidate how the proposed framework unifies many existing variants of \kmeans algorithms through a geometric perspective. We also present two natural variants of the proposed framework, where the initial number of clusters may be over- or under-specified. We provide theoretical justifications and extensive experiments to demonstrate the efficacy of the proposed approach.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TKDE.2025.3616858

2201.04822

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Simple and Scalable Sparse k-means Clustering via Feature Ranking

Neural Information Processing SystemsOct-3-2025, 06:01:50 GMT

Fortunately, in many realistic scenarios, only a handful of features may be relevant in distinguishing clusters.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.46)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

46f76a4bda9a9579eab38a8f6eabcda1-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 15:52:50 GMT

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.33)

Add feedback

Kernel K-means clustering of distributional data

Baíllo, Amparo, Berrendero, Jose R., Sánchez-Signorini, Martín

arXiv.org Machine LearningSep-23-2025

We consider the problem of clustering a sample of probability distributions from a random distribution on $\mathbb R^p$. Our proposed partitioning method makes use of a symmetric, positive-definite kernel $k$ and its associated reproducing kernel Hilbert space (RKHS) $\mathcal H$. By mapping each distribution to its corresponding kernel mean embedding in $\mathcal H$, we obtain a sample in this RKHS where we carry out the $K$-means clustering procedure, which provides an unsupervised classification of the original sample. The procedure is simple and computationally feasible even for dimension $p>1$. The simulation studies provide insight into the choice of the kernel and its tuning parameter. The performance of the proposed clustering procedure is illustrated on a collection of Synthetic Aperture Radar (SAR) images.

distributional data, kernel, procedure, (17 more...)

arXiv.org Machine Learning

2509.18037

Country: Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Unsupervised Dataset Cleaning Framework for Encrypted Traffic Classification

Qiu, Kun, Wang, Ying, Li, Baoqian, Zhu, Wenjun

arXiv.org Artificial IntelligenceSep-3-2025

Traffic classification, a technique for assigning network flows to predefined categories, has been widely deployed in enterprise and carrier networks. With the massive adoption of mobile devices, encryption is increasingly used in mobile applications to address privacy concerns. Consequently, traditional methods such as Deep Packet Inspection (DPI) fail to distinguish encrypted traffic. To tackle this challenge, Artificial Intelligence (AI), in particular Machine Learning (ML), has emerged as a promising solution for encrypted traffic classification. A crucial prerequisite for any ML-based approach is traffic data cleaning, which removes flows that are not useful for training (e.g., irrelevant protocols, background activity, control-plane messages, and long-lived sessions). Existing cleaning solutions depend on manual inspection of every captured packet, making the process both costly and time-consuming. In this poster, we present an unsupervised framework that automatically cleans encrypted mobile traffic. Evaluation on real-world datasets shows that our framework incurs only a 2%~2.5% reduction in classification accuracy compared with manual cleaning. These results demonstrate that our method offers an efficient and effective preprocessing step for ML-based encrypted traffic classification.

artificial intelligence, machine learning, traffic, (13 more...)

arXiv.org Artificial Intelligence

2509.00701

Country: Asia > China (0.16)

Genre: Research Report > New Finding (0.36)

Industry: Information Technology > Security & Privacy (0.35)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback