AITopics | k-means

Collaborating Authors

k-means

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

K-Models: a Flexible and Interpretable Method for Ordinal Clustering with Application to Antigen-Antibody Interaction Profiles

Patanè, Giulia, Menafoglio, Alessandra, Krauth, Alexander, Fechner, Peter, Dede', Luca, Colosimo, Bianca Maria, Nicolussi, Federica

arXiv.org Machine LearningMay-15-2026

Existing clustering methods for functional data often prioritize partitioning accuracy over interpretability, making it challenging to extract meaningful insights when the data-generating process follows a specific underlying structure and an ordinal relationship among clusters is suspected. This work introduces K-Models, a novel framework that integrates ordinal constraints and estimates key underlying elements of the random process generating the observed functional profiles, improving both interpretability and structure identification. The proposed method is evaluated through simulations and real-world applications. In particular, it is tested on Region of Interest (ROI) curves, which represent reaction profiles from a reflectometric sensor monitoring biomolecular interactions, such as antigen-antibody binding. These curves represent changes in reflected light intensity over time at multiple measurement spots with immobilized antigens during analyte exposure, capturing the binding dynamics of the system. The goal is to identify intrinsic signal patterns solely from the observed dynamics, making this dataset an ideal benchmark for assessing the added interpretability of the proposed approach. By incorporating structural assumptions into the clustering process, K-Models enhances interpretability while maintaining performance comparable to state-of-the-art techniques, providing a valuable tool for analyzing functional data with an underlying ordinal structure.

artificial intelligence, functional data, machine learning, (17 more...)

arXiv.org Machine Learning

2605.14828

Country: Europe (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.61)
Health & Medicine > Therapeutic Area > Immunology (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

Add feedback

Distributionally Robust K-Means Clustering

Malik, Vikrant, Kargin, Taylan, Hassibi, Babak

arXiv.org Machine LearningApr-14-2026

In recent years, the widespreadavailability of large-scale, high-dimensionaldatasets has driven significant interest in clustering algorithms that are both computationally efficient and robust to distributional shifts and outliers. The classical clustering method, K-means, can be seen as an application of the Lloyd-Max quantization algorithm, in which the distribution being quantized is the empirical distribution of the points to be clustered. This empirical distribution generally differs from the true underlying distribution, especially when the number of points to be clustered is small. This induces a distributional shift, which can also arise in many real-world settings, such as image segmentation, biological data analysis, and sensor networks, due to noise variations, sensor inaccuracies, or environmental changes. Distributional shifts can severely impact the performance of clustering algorithms, leading to degraded cluster assignments and unreliable downstream analysis. The field of clustering has a rich history. One of the most popular algorithms in this field is theK-means (KM) algorithm, introduced by [1], which computes centroids by iteratively updating the conditional mean of the data in the Voronoi regions induced by the centroids. However, standardK-means is sensitive to initialization and, in general, converges only to a local minimum.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2604.11118

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

On the Optimal Number of Grids for Differentially Private Non-Interactive $K$-Means Clustering

Muthukrishnan, Gokularam, Tandon, Anshoo

arXiv.org Machine LearningMar-31-2026

Differentially private $K$-means clustering enables releasing cluster centers derived from a dataset while protecting the privacy of the individuals. Non-interactive clustering techniques based on privatized histograms are attractive because the released data synopsis can be reused for other downstream tasks without additional privacy loss. The choice of the number of grids for discretizing the data points is crucial, as it directly controls the quantization bias and the amount of noise injected to preserve privacy. The widely adopted strategy selects a grid size that is independent of the number of clusters and also relies on empirical tuning. In this work, we revisit this choice and propose a refined grid-size selection rule derived by minimizing an upper bound on the expected deviation in the K-means objective function, leading to a more principled discretization strategy for non-interactive private clustering. Compared to prior work, our grid resolution differs both in its dependence on the number of clusters and in the scaling with dataset size and privacy budget. Extensive numerical results elucidate that the proposed strategy results in accurate clustering compared to the state-of-the-art techniques, even under tight privacy budgets.

artificial intelligence, machine learning, proc, (15 more...)

arXiv.org Machine Learning

2603.26963

Country:

North America > United States (0.14)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Distributed Gradient Clustering: Convergence and the Effect of Initialization

Armacki, Aleksandar, Sharma, Himkant, Bajović, Dragana, Jakovetić, Dušan, Chakraborty, Mrityunjoy, Kar, Soummya

arXiv.org Machine LearningMar-31-2026

We study the effects of center initialization on the performance of a family of distributed gradient-based clustering algorithms introduced in [1], that work over connected networks of users. In the considered scenario, each user contains a local dataset and communicates only with its immediate neighbours, with the aim of finding a global clustering of the joint data. We perform extensive numerical experiments, evaluating the effects of center initialization on the performance of our family of methods, demonstrating that our methods are more resilient to the effects of initialization, compared to centralized gradient clustering [2]. Next, inspired by the $K$-means++ initialization [3], we propose a novel distributed center initialization scheme, which is shown to improve the performance of our methods, compared to the baseline random initialization.

artificial intelligence, initialization, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1109/IEEECONF60004.2024.10942834

2603.20507

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.05)
Asia > India > West Bengal > Kharagpur (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

fbefa505c8e8bf6d46f38f5277fed8d6-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-15-2026, 07:29:27 GMT

We would like to point out that K-means is used only once to17 initialize the representativesets and isnot anintrinsic component ofthe online algorithm. What is important to observe21 is that N is kept constant throughout in order to reduce the storage footprint and to ensure low-complexity online22 processing. Tomaintain the list constant, for every added point another point is removed. Also, the reviewer is correct24 in observing thatN does not feature in the convergence results, which are asymptotic and do not imply anything25 about the convergence rate. Clearly, if the point dimensionm is large, it is beneficial to increaseN.

algorithm, artificial intelligence, machine learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.59)

Add feedback