AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

f7f043c3438ba9e385c51bcf50ed007e-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 20:52:19 GMT

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Communications (0.67)

Add feedback

f7f043c3438ba9e385c51bcf50ed007e-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 20:52:15 GMT

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.69)
Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Near-Optimal Correlation Clustering with Privacy

Neural Information Processing SystemsAug-19-2025, 08:52:13 GMT

In this paper, we introduce a simple and computationally efficient algorithm for the correlation clustering problem with provable privacy guarantees.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Rhode Island > Providence County > Providence (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
(10 more...)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Stability and Generalization of Kernel Clustering: From Single Kernel to Multiple Kernel

Neural Information Processing SystemsAug-19-2025, 08:38:44 GMT

Multiple kernel clustering (MKC) is an important research topic that has been widely studied for decades.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Hunan Province > Changsha (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)
Information Technology > Data Science > Data Mining (0.70)

Add feedback

Efficient and Effective Optimal Transport-Based Biclustering

Neural Information Processing SystemsAug-19-2025, 05:55:15 GMT

In this paper, we leverage optimal transport (OT) which has gained momentum in the machine learning community to propose a novel and scalable biclustering model that generalizes several classical biclustering approaches.

artificial intelligence, machine learning, matrix, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

cf7700139af1fa346d2f57f1f5c26c18-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 01:58:48 GMT

data mining, graph structure, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Oceania > Australia > Queensland (0.04)
(3 more...)

Industry:

Health & Medicine (0.46)
Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Simultaneous estimation of connectivity and dimensionality in samples of networks

Jiang, Wenlong, McKennan, Chris, Arroyo, Jesús, Cape, Joshua

arXiv.org Machine LearningAug-19-2025

An overarching objective in contemporary statistical network analysis is extracting salient information from datasets consisting of multiple networks. To date, considerable attention has been devoted to node and network clustering, while comparatively less attention has been devoted to downstream connectivity estimation and parsimonious embedding dimension selection. Given a sample of potentially heterogeneous networks, this paper proposes a method to simultaneously estimate a latent matrix of connectivity probabilities and its embedding dimensionality or rank after first pre-estimating the number of communities and the node community memberships. The method is formulated as a convex optimization problem and solved using an alternating direction method of multipliers algorithm. We establish estimation error bounds under the Frobenius norm and nuclear norm for settings in which observable networks have blockmodel structure, even when node memberships are imperfectly recovered. When perfect membership recovery is possible and dimensionality is much smaller than the number of communities, the proposed method outperforms conventional averaging-based methods for estimating connectivity and dimensionality. Numerical studies empirically demonstrate the accuracy of our method across various scenarios. Additionally, analysis of a primate brain dataset demonstrates that posited connectivity is not necessarily full rank in practice, illustrating the need for flexible methodology.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2508.12483

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Texas (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Constrained Centroid Clustering: A Novel Approach for Compact and Structured Partitioning

Veeramachaneni, Sowmini Devi, Garimella, Ramamurthy

arXiv.org Machine LearningAug-19-2025

This paper presents Constrained Centroid Clustering (CCC), a method that extends classical centroid-based clustering by enforcing a constraint on the maximum distance between the cluster center and the farthest point in the cluster. Using a Lagrangian formulation, we derive a closed-form solution that maintains interpretability while controlling cluster spread. To evaluate CCC, we conduct experiments on synthetic circular data with radial symmetry and uniform angular distribution. Using ring-wise, sector-wise, and joint entropy as evaluation metrics, we show that CCC achieves more compact clusters by reducing radial spread while preserving angular structure, outperforming standard methods such as K-means and GMM. The proposed approach is suitable for applications requiring structured clustering with spread control, including sensor networks, collaborative robotics, and interpretable pattern analysis.

cluster center, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2508.12758

Country:

Asia > India > Telangana > Hyderabad (0.05)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Asia > India > Andhra Pradesh > Kakinada (0.04)

Genre:

Research Report > Promising Solution (0.50)
Overview > Innovation (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

FedUHD: Unsupervised Federated Learning using Hyperdimensional Computing

Lee, You Hak, Yu, Xiaofan, Zhao, Quanling, Ponzina, Flavio, Rosing, Tajana

arXiv.org Artificial IntelligenceAug-19-2025

Unsupervised federated learning (UFL) has gained attention as a privacy-preserving, decentralized machine learning approach that eliminates the need for labor-intensive data labeling. However, UFL faces several challenges in practical applications: (1) non-independent and identically distributed (non-iid) data distribution across devices, (2) expensive computational and communication costs at the edge, and (3) vulnerability to communication noise. Previous UFL approaches have relied on deep neural networks (NN), which introduce substantial overhead in both computation and communication. In this paper, we propose FedUHD, the first UFL framework based on Hyperdimensional Computing (HDC). HDC is a brain-inspired computing scheme with lightweight training and inference operations, much smaller model size, and robustness to communication noise. FedUHD introduces two novel HDC-based designs to improve UFL performance. On the client side, a kNN-based cluster hypervector removal method addresses non-iid data samples by eliminating detrimental outliers. On the server side, a weighted HDC aggregation technique balances the non-iid data distribution across clients. Our experiments demonstrate that FedUHD achieves up to 173.6x and 612.7x better speedup and energy efficiency, respectively, in training, up to 271x lower communication cost, and 15.50% higher accuracy on average across diverse settings, along with superior robustness to various types of noise compared to state-of-the-art NN-based UFL approaches.

artificial intelligence, feduhd, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.12021

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Deep Language Geometry: Constructing a Metric Space from LLM Weights

Shamrai, Maksym, Hamolia, Vladyslav

arXiv.org Artificial IntelligenceAug-19-2025

We introduce a novel framework that utilizes the internal weight activations of modern Large Language Models (LLMs) to construct a metric space of languages. Unlike traditional approaches based on hand-crafted linguistic features, our method automatically derives high-dimensional vector representations by computing weight importance scores via an adapted pruning algorithm. Our approach captures intrinsic language characteristics that reflect linguistic phenomena. We validate our approach across diverse datasets and multilingual LLMs, covering 106 languages. The results align well with established linguistic families while also revealing unexpected inter-language connections that may indicate historical contact or language evolution. The source code, computed language latent vectors, and visualization tool are made publicly available at https://github.com/mshamrai/deep-language-geometry.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2508.11676

Country: Europe > Ukraine (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback