AITopics | spectral clustering

Collaborating Authors

spectral clustering

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cluster LOCO: Feature Importance For Interpreting Clusters

He, Claire M., Allen, Genevera I.

arXiv.org Machine LearningJun-15-2026

Clustering is widely used for exploratory analysis and scientific discovery, driving insights from market segmentation to biological data analysis, but its outputs can be difficult to interpret, audit, and reproduce as modern datasets become increasingly large and complex. Reliable use of clustering requires understanding which features drive the discovered structure, yet feature-level explanations for clustering remain scarce compared with methods in supervised learning. Furthermore, existing clustering feature importance scores are often tied to specific algorithms and data assumptions. To address these challenges, we propose Cluster LOCO (Leave-One-Covariate-Out), a family of model-agnostic feature importance scores for clustering. Cluster LOCO is built on feature occlusion and clustering generalizability, defined as whether cluster labels learned on one subset of the data can be accurately predicted on held-out samples. For any chosen clustering algorithm, Cluster LOCO quantifies a feature's importance by measuring how much its removal degrades generalizability. We first introduce Cluster LOCO-Split, which relies on data splitting, and then extend it to Cluster LOCO-MP, a minipatch ensemble-based version designed for large-scale data. Across synthetic simulations and an application to cell-type discovery in single-cell transcriptomics, we show that Cluster LOCO more reliably recovers informative features than existing clustering feature importance methods.

cluster loco, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2606.14592

Country: North America > United States > New York (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning

Neural Information Processing SystemsApr-30-2026, 02:55:51 GMT

Spectral clustering has gained popularity for clustering non-convex data due to its simplicity and effectiveness. It is essential to construct a similarity graph using a high-quality affinity measure that models the local neighborhood relations among the data samples. However, incomplete data can lead to inaccurate affinity measures, resulting in degraded clustering performance. To address these issues, we propose an imputation-free framework with two novel approaches to improve spectral clustering on incomplete data. Firstly, we introduce a new kernel correction method that enhances the quality of the kernel matrix estimated on incomplete data with a theoretical guarantee, benefiting classical spectral clustering on pre-defined kernels. Secondly, we develop a series of affinity learning methods that equip the selfexpressive framework with ℓp-norm to construct an intrinsic affinity matrix with an adaptive extension. Our methods outperform existing data imputation and distance calibration techniques on benchmark datasets, offering a promising solution to spectral clustering on incomplete data in various real-world applications.

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.94)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning Fangchen Y u

Neural Information Processing SystemsFeb-17-2026, 16:32:19 GMT

These techniques calibrate an initial non-metric distance matrix estimated on incomplete data to a distance metric.

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > China > Hong Kong (0.04)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning

Neural Information Processing SystemsDec-27-2025, 01:25:19 GMT

Spectral clustering has gained popularity for clustering non-convex data due to its simplicity and effectiveness. It is essential to construct a similarity graph using a high-quality affinity measure that models the local neighborhood relations among the data samples. However, incomplete data can lead to inaccurate affinity measures, resulting in degraded clustering performance. To address these issues, we propose an imputation-free framework with two novel approaches to improve spectral clustering on incomplete data. Firstly, we introduce a new kernel correction method that enhances the quality of the kernel matrix estimated on incomplete data with a theoretical guarantee, benefiting classical spectral clustering on pre-defined kernels. Secondly, we develop a series of affinity learning methods that equip the self-expressive framework with $\ell_p$-norm to construct an intrinsic affinity matrix with an adaptive extension. Our methods outperform existing data imputation and distance calibration techniques on benchmark datasets, offering a promising solution to spectral clustering on incomplete data in various real-world applications.

incomplete data, kernel correction and affinity learning, spectral clustering, (4 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Fully Bayesian Spectral Clustering and Benchmarking with Uncertainty Quantification for Small Area Estimation

Fúquene-Patiño, Jairo

arXiv.org Machine LearningDec-18-2025

In this work, inspired by machine learning techniques, we propose a new Bayesian model for Small Area Estimation (SAE), the Fay-Herriot model with Spectral Clustering (FH-SC). Unlike traditional approaches, clustering in FH-SC is based on spectral clustering algorithms that utilize external covariates, rather than geographical or administrative criteria. A major advantage of the FH-SC model is its flexibility in integrating existing SAE approaches, with or without clustering random effects. To enable benchmarking, we leverage the theoretical framework of posterior projections for constrained Bayesian inference and derive closed form expressions for the new Rao-Blackwell (RB) estimators of the posterior mean under the FH-SC model. Additionally, we introduce a novel measure of uncertainty for the benchmarked estimator, the Conditional Posterior Mean Square Error (CPMSE), which is generalizable to other Bayesian SAE estimators. We conduct model-based and data-based simulation studies to evaluate the frequentist properties of the CPMSE. The proposed methodology is motivated by a real case study involving the estimation of the proportion of households with internet access in the municipalities of Colombia. Finally, we also illustrate the advantages of FH-SC over existing Bayesian and frequentist approaches through our case study.

estimator, fh-sc, fh-sc model, (17 more...)

arXiv.org Machine Learning

2512.15643

Country:

Africa > Sub-Saharan Africa (0.14)
South America > Colombia > La Guajira Department > Riohacha (0.04)
Asia > Middle East > Jordan (0.04)
(7 more...)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Health & Medicine (0.67)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

An Improved and Generalised Analysis for Spectral Clustering

Tyler, George, Zanetti, Luca

arXiv.org Artificial IntelligenceDec-1-2025

We revisit the theoretical performances of Spectral Clustering, a classical algorithm for graph partitioning that relies on the eigenvectors of a matrix representation of the graph. Informally, we show that Spectral Clustering works well as long as the smallest eigenvalues appear in groups well separated from the rest of the matrix representation's spectrum. This arises, for example, whenever there exists a hierarchy of clusters at different scales, a regime not captured by previous analyses. Our results are very general and can be applied beyond the traditional graph Laplacian. In particular, we study Hermitian representations of digraphs and show Spectral Clustering can recover partitions where edges between clusters are oriented mostly in the same direction. This has applications in, for example, the analysis of trophic levels in ecological networks. We demonstrate that our results accurately predict the performances of Spectral Clustering on synthetic and real-world data sets.

data mining, machine learning, spectral clustering, (14 more...)

arXiv.org Artificial Intelligence

2511.23261

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Communications (0.93)
Information Technology > Data Science > Data Mining (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Advanced spectral clustering for heterogeneous data in credit risk monitoring systems

Han, Lu, Li, Mengyan, Qiang, Jiping, Su, Zhi

arXiv.org Artificial IntelligenceSep-3-2025

Heterogeneous data, which encompass both numerical financial variables and textual records, present substantial challenges for credit monitoring. To address this issue, we propose Advanced Spectral Clustering (ASC), a method that integrates financial and textual similarities through an optimized weight parameter and selects eigenvectors using a novel eigenvalue-silhouette optimization approach. Evaluated on a dataset comprising 1,428 small and medium-sized enterprises (SMEs), ASC achieves a Silhouette score that is 18% higher than that of a single-type data baseline method. Furthermore, the resulting clusters offer actionable insights; for instance, 51% of low-risk firms are found to include the term 'social recruitment' in their textual records. The robustness of ASC is confirmed across multiple clustering algorithms, including k-means, k-medians, and k-medoids, with ΔIntra/Inter < 0.13 and ΔSilhouette Coefficient < 0.02. By bridging spectral clustering theory with heterogeneous data applications, ASC enables the identification of meaningful clusters, such as recruitment-focused SMEs exhibiting a 30% lower default risk, thereby supporting more targeted and effective credit interventions.

data mining, machine learning, spectral, (20 more...)

arXiv.org Artificial Intelligence

2509.00546

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.68)

Industry:

Banking & Finance > Credit (1.00)
Banking & Finance > Risk Management (0.70)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Unsupervised Learning: Comparative Analysis of Clustering Techniques on High-Dimensional Data

Baligodugula, Vishnu Vardhan, Amsaad, Fathi

arXiv.org Machine LearningMar-29-2025

--This paper presents a comprehensive comparative analysis of prominent clustering algorithms--K-means, DB-SCAN, and Spectral Clustering--on high-dimensional datasets. We introduce a novel evaluation framework that assesses clustering performance across multiple dimensionality reduction techniques (PCA, t-SNE, and UMAP) using diverse quantitative metrics. Experiments conducted on MNIST, Fashion-MNIST, and UCI HAR datasets reveal that preprocessing with UMAP consistently improves clustering quality across all algorithms, with Spectral Clustering demonstrating superior performance on complex manifold structures. Our findings show that algorithm selection should be guided by data characteristics, with K-means excelling in computational efficiency, DBSCAN in handling irregular clusters, and Spectral Clustering in capturing complex relationships. This research contributes a systematic approach for evaluating and selecting clustering techniques for high-dimensional data applications.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

2503.23215

Country:

Asia > Singapore (0.05)
Asia > Pakistan > Sindh > Karachi Division > Karachi (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Multi-View Spectral Clustering for Graphs with Multiple View Structures

Tsitsikas, Yorgos, Papalexakis, Evangelos E.

arXiv.org Artificial IntelligenceJan-28-2025

Despite the fundamental importance of clustering, to this day, much of the relevant research is still based on ambiguous foundations, leading to an unclear understanding of whether or how the various clustering methods are connected with each other. In this work, we provide an additional stepping stone towards resolving such ambiguities by presenting a general clustering framework that subsumes a series of seemingly disparate clustering methods, including various methods belonging to the widely popular spectral clustering framework. In fact, the generality of the proposed framework is additionally capable of shedding light to the largely unexplored area of multi-view graphs where each view may have differently clustered nodes. In turn, we propose GenClus: a method that is simultaneously an instance of this framework and a generalization of spectral clustering, while also being closely related to k-means as well. This results in a principled alternative to the few existing methods studying this special type of multi-view graphs. Then, we conduct in-depth experiments, which demonstrate that GenClus is more computationally efficient than existing methods, while also attaining similar or better clustering performance. Lastly, a qualitative real-world case-study further demonstrates the ability of GenClus to produce meaningful clusterings.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.11422

Country:

Asia (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > United States > California > Riverside County > Riverside (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.86)

Add feedback

Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning

Neural Information Processing SystemsJan-20-2025, 01:05:26 GMT

Spectral clustering has gained popularity for clustering non-convex data due to its simplicity and effectiveness. It is essential to construct a similarity graph using a high-quality affinity measure that models the local neighborhood relations among the data samples. However, incomplete data can lead to inaccurate affinity measures, resulting in degraded clustering performance. To address these issues, we propose an imputation-free framework with two novel approaches to improve spectral clustering on incomplete data. Firstly, we introduce a new kernel correction method that enhances the quality of the kernel matrix estimated on incomplete data with a theoretical guarantee, benefiting classical spectral clustering on pre-defined kernels. Secondly, we develop a series of affinity learning methods that equip the self-expressive framework with \ell_p -norm to construct an intrinsic affinity matrix with an adaptive extension.

incomplete data, kernel correction and affinity learning, spectral clustering, (2 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback