AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Welfare-Centric Clustering

Zhang, Claire Jie, Esmaeili, Seyed A., Morgenstern, Jamie

arXiv.org Artificial IntelligenceAug-15-2025

Fair clustering has traditionally focused on ensuring equitable group representation or equalizing group-specific clustering costs. However, Dickerson et al. (2025) recently showed that these fairness notions may yield undesirable or unintuitive clustering outcomes and advocated for a welfare-centric clustering approach that models the utilities of the groups. In this work, we model group utilities based on both distances and proportional representation and formalize two optimization objectives based on welfare-centric clustering: the Rawlsian (Egalitarian) objective and the Utilitarian objective. We introduce novel algorithms for both objectives and prove theoretical guarantees for them. Empirical evaluations on multiple real-world datasets demonstrate that our methods significantly outperform existing fair clustering baselines.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.10345

Country:

Europe (0.45)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Prompt-Response Semantic Divergence Metrics for Faithfulness Hallucination and Misalignment Detection in Large Language Models

Halperin, Igor

arXiv.org Artificial IntelligenceAug-15-2025

The proliferation of Large Language Models (LLMs) is challenged by hallucinations, critical failure modes where models generate non-factual, nonsensical or unfaithful text. This paper introduces Semantic Divergence Metrics (SDM), a novel lightweight framework for detecting Faithfulness Hallucinations -- events of severe deviations of LLMs responses from input contexts. We focus on a specific implementation of these LLM errors, {confabulations, defined as responses that are arbitrary and semantically misaligned with the user's query. Existing methods like Semantic Entropy test for arbitrariness by measuring the diversity of answers to a single, fixed prompt. Our SDM framework improves upon this by being more prompt-aware: we test for a deeper form of arbitrariness by measuring response consistency not only across multiple answers but also across multiple, semantically-equivalent paraphrases of the original prompt. Methodologically, our approach uses joint clustering on sentence embeddings to create a shared topic space for prompts and answers. A heatmap of topic co-occurances between prompts and responses can be viewed as a quantified two-dimensional visualization of the user-machine dialogue. We then compute a suite of information-theoretic metrics to measure the semantic divergence between prompts and responses. Our practical score, $\mathcal{S}_H$, combines the Jensen-Shannon divergence and Wasserstein distance to quantify this divergence, with a high score indicating a Faithfulness hallucination. Furthermore, we identify the KL divergence KL(Answer $||$ Prompt) as a powerful indicator of \textbf{Semantic Exploration}, a key signal for distinguishing different generative behaviors. These metrics are further combined into the Semantic Box, a diagnostic framework for classifying LLM response types, including the dangerous, confident confabulation.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2508.10192

Genre: Research Report > New Finding (0.68)

Industry:

Banking & Finance (0.67)
Commercial Services & Supplies > Security & Alarm Services (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Consistent Estimation of Identifiable Nonparametric Mixture Models from Grouped Observations Alexander Ritchie

Neural Information Processing SystemsAug-14-2025, 23:54:25 GMT

Most work on estimating mixture models assumes an iid sampling scheme.

dataset, mixture component, mixture model, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > Canada (0.04)
Europe > Germany > Berlin (0.04)
Asia (0.04)

Industry: Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
(2 more...)

Add feedback

Consistency of Constrained Spectral Clustering under Graph Induced Fair Planted Partitions

Neural Information Processing SystemsAug-14-2025, 23:39:01 GMT

Spectral clustering is popular among practitioners and theoreticians alike.

algorithm, constraint, spectral, (14 more...)

Neural Information Processing Systems

Country:

Asia > India > Karnataka > Bengaluru (0.04)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
Europe > France (0.04)

Genre: Research Report (0.46)

Industry:

Government (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Deep Conditional Gaussian Mixture Model for Constrained Clustering

Neural Information Processing SystemsAug-14-2025, 18:06:30 GMT

Thus, we restrict our search for a constrained clustering approach to the class of deep generative models. Although these models have been successfully used in the unsupervised setting (Jiang et al., 2017; Dilokthanakul et al., 2016), their application to constrained clustering has been under-explored.

constraint, information, pairwise constraint, (12 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.05)
Europe > Germany > Bavaria > Regensburg (0.05)
North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

4c91cf13f8827a1b46656439e32ff74b-Paper-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 17:07:10 GMT

mixture model, optimal transport, proceedings, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.96)
Information Technology > Data Science (0.93)

Add feedback

Wasserstein K-means for clustering probability distributions

Neural Information Processing SystemsAug-14-2025, 15:54:08 GMT

Clustering is an important exploratory data analysis technique to group objects based on their similarity.

barycenter, wasserstein barycenter, wasserstein space, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Data-Driven Conditional Robust Optimization

Neural Information Processing SystemsAug-14-2025, 08:17:52 GMT

In most real world decision problems, the decision maker (DM) faces uncertainty either in the objective function that he aims to optimize, or some of the constraints that he needs to satisfy.

artificial intelligence, information, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.68)

Industry:

Banking & Finance > Trading (0.68)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

34260a400e39a802961470b3d3de99cc-Paper-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 05:07:42 GMT

algorithm, factor matrix, tensor, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
South America > Brazil (0.04)
North America > Cuba (0.04)
(12 more...)

Genre: Research Report (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science (0.68)

Add feedback

scAGC: Learning Adaptive Cell Graphs with Contrastive Guidance for Single-Cell Clustering

Li, Huifa, Fu, Jie, Zhuang, Xinlin, Yang, Haolin, Ling, Xinpeng, Cheng, Tong, xue, Haochen, Razzak, Imran, Chen, Zhili

arXiv.org Artificial IntelligenceAug-14-2025

Accurate cell type annotation is a crucial step in analyzing single-cell RNA sequencing (scRNA-seq) data, which provides valuable insights into cellular heterogeneity. However, due to the high dimensionality and prevalence of zero elements in scRNA-seq data, traditional clustering methods face significant statistical and computational challenges. While some advanced methods use graph neural networks to model cell-cell relationships, they often depend on static graph structures that are sensitive to noise and fail to capture the long-tailed distribution inherent in single-cell populations.To address these limitations, we propose scAGC, a single-cell clustering method that learns adaptive cell graphs with contrastive guidance. Our approach optimizes feature representations and cell graphs simultaneously in an end-to-end manner. Specifically, we introduce a topology-adaptive graph autoencoder that leverages a differentiable Gumbel-Softmax sampling strategy to dynamically refine the graph structure during training. This adaptive mechanism mitigates the problem of a long-tailed degree distribution by promoting a more balanced neighborhood structure. To model the discrete, over-dispersed, and zero-inflated nature of scRNA-seq data, we integrate a Zero-Inflated Negative Binomial (ZINB) loss for robust feature reconstruction. Furthermore, a contrastive learning objective is incorporated to regularize the graph learning process and prevent abrupt changes in the graph topology, ensuring stability and enhancing convergence. Comprehensive experiments on 9 real scRNA-seq datasets demonstrate that scAGC consistently outperforms other state-of-the-art methods, yielding the best NMI and ARI scores on 9 and 7 datasets, respectively.Our code is available at Anonymous Github.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.0918

Country: