AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Towards Interpretable and Inference-Optimal COT Reasoning with Sparse Autoencoder-Guided Generation

Zhao, Daniel, Shankarampeta, Abhilash, Hu, Lanxiang, Rosing, Tajana, Zhang, Hao

arXiv.org Artificial IntelligenceOct-3-2025

We propose a novel method that leverages sparse autoencoders (SAEs) and clustering techniques to analyze the internal token representations of large language models (LLMs) and guide generations in mathematical reasoning tasks. Our approach first trains an SAE to generate sparse vector representations for training tokens, then applies k-means clustering to construct a graph where vertices represent token clusters and weighted edges capture sequential token transitions. Using this graph, we define an edge-weight based reward function to quantify adherence to established reasoning traces, thereby identifying exploitative reasoning trajectories. Additionally, we measure generation diversity from clustering to assess the extent of exploration. Our findings indicate that balancing both exploitation and exploration is crucial for achieving high accuracy in mathematical reasoning tasks. During generation, the SAE can serve as a scalable reward model to guide generations, ensuring a balanced trade-off between exploitation and exploration. This prevents extreme behaviors in either direction, ultimately fostering a higher-quality reasoning process in LLMs.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.01528

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Add feedback

Landcover classification and change detection using remote sensing and machine learning: a case study of Western Fiji

Gurjar, Yadvendra, Wan, Ruoni, Farahbakhsh, Ehsan, Chandra, Rohitash

arXiv.org Artificial IntelligenceOct-3-2025

As a developing country, Fiji is facing rapid urbanisation, which is visible in the massive development projects that include housing, roads, and civil works. In this study, we present machine learning and remote sensing frameworks to compare land use and land cover change from 2013 to 2024 in Nadi, Fiji. The ultimate goal of this study is to provide technical support in land cover/land use modelling and change detection. We used Landsat-8 satellite image for the study region and created our training dataset with labels for supervised machine learning. We used Google Earth Engine and unsupervised machine learning via k-means clustering to generate the land cover map. We used convolutional neural networks to classify the selected regions' land cover types. We present a visualisation of change detection, highlighting urban area changes over time to monitor changes in the map.

artificial intelligence, classification, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.13388

Country:

Oceania > Fiji (0.84)
Asia > China > Guangdong Province (0.28)

Genre: Research Report > New Finding (0.50)

Industry:

Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.76)
Consumer Products & Services (0.69)
Law > Real Estate Law (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data

Xu, Ping, Ning, Zhiyuan, Li, Pengjiang, Liu, Wenhao, Wang, Pengyang, Cui, Jiaxu, Zhou, Yuanchun, Wang, Pengfei

arXiv.org Artificial IntelligenceOct-3-2025

Single-cell RNA sequencing (scRNA-seq) reveals cell heterogeneity, with cell clustering playing a key role in identifying cell types and marker genes. Recent advances, especially graph neural networks (GNNs)-based methods, have significantly improved clustering performance. However, the analysis of scRNA-seq data remains challenging due to noise, sparsity, and high dimensionality. Compounding these challenges, GNNs often suffer from over-smoothing, limiting their ability to capture complex biological information. In response, we propose scSiameseClu, a novel Siamese Clustering framework for interpreting single-cell RNA-seq data, comprising of 3 key steps: (1) Dual Augmentation Module, which applies biologically informed perturbations to the gene expression matrix and cell graph relationships to enhance representation robustness; (2) Siamese Fusion Module, which combines cross-correlation refinement and adaptive information fusion to capture complex cellular relationships while mitigating over-smoothing; and (3) Optimal Transport Clustering, which utilizes Sinkhorn distance to efficiently align cluster assignments with predefined proportions while maintaining balance. Comprehensive evaluations on seven real-world datasets demonstrate that scSiameseClu outperforms state-of-the-art methods in single-cell clustering, cell type annotation, and cell type classification, providing a powerful tool for scRNA-seq data interpretation.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.12626

Country: Asia > China (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(2 more...)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 23:49:19 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes an incremental but very sensible and practical modification to'curriculum learning'. Given a partition of the training examples into classes, they propose an additional regularising term (and an additional parameter) to ensure that the'easy' examples selected during learning are spread across the classes, and not from one class. The partition into classes can come from a clustering algorithm, or from a priori knowledge. The idea is straightforward and sensible, and the authors propose an algorithm that looks efficient and correct.

algorithm, diversity, self-paced learning, (10 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.05)

Genre: Overview (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.36)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 22:49:16 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary of the paper: The paper studies the incremental clustering problem and shows several properties: - It shows that no deterministic memory-bounded incremental clustering method is nice-detecting. Specifically, the authors show that no deterministic nice-detecting incremental clustering algorithm can use less than 2^{cp-1} bits of memory for data in R^p under the l2 metric. Then some example algorithms are displayed. General comments: - The paper is written clearly and the guarantees in this paper are solid.

algorithm, incremental algorithm, sequential k-means, (12 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.91)

Add feedback

6a4d5952d4c018a1c1af9fa590a10dda-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 22:26:04 GMT

artificial intelligence, machine learning, modality, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report (0.46)

Industry: Information Technology (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Differentially Private Algorithms for Learning Mixtures of Separated Gaussians

Gautam Kamath, Or Sheffet, Vikrant Singhal, Jonathan Ullman

Neural Information Processing SystemsOct-2-2025, 22:13:09 GMT

(COL T 2005).

algorithm, gaussian, proceedings, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
North America > United States > District of Columbia > Washington (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(6 more...)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Tight Continuous Relaxation of the Balanced k-Cut Problem

Syama Sundar Rangapuram, Pramod Kaushik Mudrakarta, Matthias Hein

Neural Information Processing SystemsOct-2-2025, 22:06:33 GMT

Spectral Clustering as a relaxation of the normalized/ratio cut has become one of the standard graph-based clustering methods. Existing methods for the computation of multiple clusters, corresponding to a balanced k -cut of the graph, are either based on greedy techniques or heuristics which have weak connection to the original motivation of minimizing the normalized cut. In this paper we propose a new tight continuous relaxation for any balanced k -cut problem and show that a related recently proposed relaxation is in most cases loose leading to poor performance in practice. For the optimization of our tight continuous relaxation we propose a new algorithm for the difficult sum-of-ratios minimization problem which achieves monotonic descent. Extensive comparisons show that our method outperforms all existing approaches for ratio cut and other balanced k -cut criteria.

constraint, partition, relaxation, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > Germany > Saarland (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 21:53:31 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors present a novel non-parametric Bayesian model for unsupervised clustering. The model uses a two level hierarchy of Dirichlet process priors to handle clusters which may be multi-modal, skewed and/or heavy tailed. The authors present a collapsed Gibbs sampler for inference which exploits the conjugacy of the model. The authors do an excellent job of motivating the model by explaining the deficiencies of the standard infinite mixture of Gaussians.

algorithm, concentration parameter, infinite mixture, (11 more...)

Neural Information Processing Systems

Country: