AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Graph Clustering With Missing Data: Convex Algorithms and Analysis

Ramya Korlakai Vinayak, Samet Oymak, Babak Hassibi

Neural Information Processing SystemsFeb-9-2025, 02:36:22 GMT

We consider the problem of finding clusters in an unweighted graph, when the graph is partially observed. We analyze two programs, one which works for dense graphs and one which works for both sparse and dense graphs, but requires some a priori knowledge of the total cluster size, that are based on the convex optimization approach for low-rank matrix recovery using nuclear norm minimization. For the commonly used Stochastic Block Model, we obtain explicit bounds on the parameters of the problem (size and sparsity of clusters, the amount of observed data) and the regularization parameter characterize the success and failure of the programs. We corroborate our theoretical findings through extensive simulations. We also run our algorithm on a real data set obtained from crowdsourcing an image classification task on the Amazon Mechanical Turk, and observe significant performance improvement over traditional methods such as k-means.

artificial intelligence, machine learning, program 1, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Multi-Scale Spectral Decomposition of Massive Graphs

Si Si, Donghyuk Shin, Inderjit S. Dhillon, Beresford N. Parlett

Neural Information Processing SystemsFeb-9-2025, 00:57:10 GMT

Computing the k dominant eigenvalues and eigenvectors of massive graphs is a key operation in numerous machine learning applications; however, popular solvers suffer from slow convergence, especially when k is reasonably large. In this paper, we propose and analyze a novel multi-scale spectral decomposition method (MSEIGS), which first clusters the graph into smaller clusters whose spectral decomposition can be computed efficiently and independently. We show theoretically as well as empirically that the union of all cluster's subspaces has significant overlap with the dominant subspace of the original graph, provided that the graph is clustered appropriately. Thus, eigenvectors of the clusters serve as good initializations to a block Lanczos algorithm that is used to compute spectral decomposition of the original graph. We further use hierarchical clustering to speed up the computation and adopt a fast early termination strategy to compute quality approximations. Our method outperforms widely used solvers in terms of convergence speed and approximation quality. Furthermore, our method is naturally parallelizable and exhibits significant speedups in shared-memory parallel settings. For example, on a graph with more than 82 million nodes and 3.6 billion edges, MSEIGS takes less than 3 hours on a single-core machine while Randomized SVD takes more than 6 hours, to obtain a similar approximation of the top-50 eigenvectors. Using 16 cores, we can reduce this time to less than 40 minutes.

artificial intelligence, graph, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Minnesota (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

Add feedback

Revisiting Dynamic Graph Clustering via Matrix Factorization

Li, Dongyuan, Kosugi, Satoshi, Zhang, Ying, Okumura, Manabu, Xia, Feng, Jiang, Renhe

arXiv.org Machine LearningFeb-9-2025

Dynamic graph clustering aims to detect and track time-varying clusters in dynamic graphs, revealing the evolutionary mechanisms of complex real-world dynamic systems. Matrix factorization-based methods are promising approaches for this task; however, these methods often struggle with scalability and can be time-consuming when applied to large-scale dynamic graphs. Moreover, they tend to lack robustness and are vulnerable to real-world noisy data. To address these issues, we make three key contributions. First, to improve scalability, we propose temporal separated matrix factorization, where a single matrix is divided into multiple smaller matrices for independent factorization, resulting in faster computation. Second, to improve robustness, we introduce bi-clustering regularization, which jointly optimizes graph embedding and clustering, thereby filtering out noisy features from the graph embeddings. Third, to further enhance effectiveness and efficiency, we propose selective embedding updating, where we update only the embeddings of dynamic nodes while the embeddings of static nodes are fixed among different timestamps. Experimental results on six synthetic and five real-world benchmarks demonstrate the scalability, robustness and effectiveness of our proposed method. Source code is available at https://github.com/Clearloveyuan/DyG-MF.

artificial intelligence, graph, machine learning, (14 more...)

arXiv.org Machine Learning

doi: 10.1145/3696410.3714646

2502.06117

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
Oceania > Australia > New South Wales > Sydney (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Nested subspace learning with flags

Szwagier, Tom, Pennec, Xavier

arXiv.org Machine LearningFeb-9-2025

Many machine learning methods look for low-dimensional representations of the data. The underlying subspace can be estimated by first choosing a dimension $q$ and then optimizing a certain objective function over the space of $q$-dimensional subspaces (the Grassmannian). Trying different $q$ yields in general non-nested subspaces, which raises an important issue of consistency between the data representations. In this paper, we propose a simple trick to enforce nestedness in subspace learning methods. It consists in lifting Grassmannian optimization problems to flag manifolds (the space of nested subspaces of increasing dimension) via nested projectors. We apply the flag trick to several classical machine learning methods and show that it successfully addresses the nestedness issue.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

2502.06022

Country:

Europe > France > Provence-Alpes-Côte d'Azur (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin (0.04)
Asia > Japan > Honshū > Tōhoku (0.04)

Genre: Research Report (0.50)

Industry:

Education (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Clustering from Labels and Time-Varying Graphs

Shiau Hong Lim, Yudong Chen, Huan Xu

Neural Information Processing SystemsFeb-8-2025, 23:39:44 GMT

We present a general framework for graph clustering where a label is observed to each pair of nodes. This allows a very rich encoding of various types of pairwise interactions between nodes. We propose a new tractable approach to this problem based on maximum likelihood estimator and convex optimization. We analyze our algorithm under a general generative model, and provide both necessary and sufficient conditions for successful recovery of the underlying clusters. Our theoretical results cover and subsume a wide range of existing graph clustering results including planted partition, weighted clustering and partially observed graphs. Furthermore, the result is applicable to novel settings including time-varying graphs such that new insights can be gained on solving these problems. Our theoretical findings are further supported by empirical results on both synthetic and real data.

artificial intelligence, graph, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Spain > Andalusia > Seville Province > Seville (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

Extracting Certainty from Uncertainty: Transductive Pairwise Classification from Pairwise Similarities

Tianbao Yang, Rong Jin

Neural Information Processing SystemsFeb-8-2025, 17:12:54 GMT

We propose a very simple yet effective algorithm that consists of two simple steps: the first step is to complete the sub-matrix corresponding to the labeled examples and the second step is to reconstruct the label matrix from the completed sub-matrix and the provided similarity matrix. Our analysis exhibits that under several mild preconditions we can recover the label matrix with a small error, if the top eigen-space that corresponds to the largest eigenvalues of the similarity matrix covers well the column space of label matrix and is subject to a low coherence, and the number of observed pairwise labels is sufficiently enough. We demonstrate the effectiveness of the proposed algorithm by several experiments.

artificial intelligence, machine learning, matrix, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Iowa > Johnson County > Iowa City (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Michigan > Ingham County > Lansing (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Recovery of Coherent Data via Low-Rank Dictionary Pursuit

Guangcan Liu, Ping Li

Neural Information Processing SystemsFeb-8-2025, 15:57:07 GMT

The recently established RPCA [4] method provides a convenient way to restore low-rank matrices from grossly corrupted observations. While elegant in theory and powerful in reality, RPCA is not an ultimate solution to the low-rank matrix recovery problem. Indeed, its performance may not be perfect even when data are strictly low-rank. This is because RPCA ignores clustering structures of the data which are ubiquitous in applications. As the number of cluster grows, the coherence of data keeps increasing, and accordingly, the recovery performance of RPCA degrades.

artificial intelligence, machine learning, matrix, (16 more...)

Neural Information Processing Systems

Country: North America > United States > New Jersey > Middlesex County > Piscataway (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

On a Theory of Nonparametric Pairwise Similarity for Clustering: Connecting Clustering to Classification

Yingzhen Yang, Feng Liang, Shuicheng Yan, Zhangyang Wang, Thomas S. Huang

Neural Information Processing SystemsFeb-8-2025, 15:32:38 GMT

The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class classification. This pairwise clustering framework learns an unsupervised nonparametric classifier from each data partition, and search for the optimal partition of the data by minimizing the generalization error of the learned classifiers associated with the data partitions. We consider two nonparametric classifiers in this framework, i.e. the nearest neighbor classifier and the plug-in classifier. Modeling the underlying data distribution by nonparametric kernel density estimation, the generalization error bounds for both unsupervised nonparametric classifiers are the sum of nonparametric pairwise similarity terms between the data points for the purpose of clustering. Under uniform distribution, the nonparametric similarity terms induced by both unsupervised classifiers exhibit a well known form of kernel similarity. We also prove that the generalization error bound for the unsupervised plugin classifier is asymptotically equal to the weighted volume of cluster boundary [1] for Low Density Separation, a widely used criteria for semi-supervised learning and clustering. Based on the derived nonparametric pairwise similarity using the plug-in classifier, we propose a new nonparametric exemplar-based clustering method with enhanced discriminative capability, whose superiority is evidenced by the experimental results.

artificial intelligence, classifier, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.90)

Add feedback

Review for NeurIPS paper: Fair Hierarchical Clustering

Neural Information Processing SystemsFeb-8-2025, 01:48:22 GMT

Additional Feedback: Line 68: Kleindessner et al. designed an algorithm for k-center with different type of fairness requirement. Instead of balancing different colors in each cluster, the goal is to pick centers (proportionally) from different colors. It is basically k-center under partition matroid. Line 69-70: In a(n almost) concurrent work, the fair correlation was also studied by Ahamdi et al. Line 131: Bounded representation: with binary colors, it is the same as balance.

different color, fair hierarchical clustering, neurips paper

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Add feedback

Review for NeurIPS paper: Fair Hierarchical Clustering

Neural Information Processing SystemsFeb-8-2025, 01:48:15 GMT

The paper considers hierarchical clustering with fairness constraints and show the concept of fairlet decomposition extends to fair hierarchical clustering as well. The notion of fairness is important though it would be good if the authors improve the motivation section of this work.

fair hierarchical clustering, neurips paper

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback