AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Planar Ultrametrics for Image Segmentation

Neural Information Processing SystemsMar-12-2024, 22:14:50 GMT

We study the problem of hierarchical clustering on planar graphs. We formulate this in terms of finding the closest ultrametric to a specified set of distances and solve it using an LP relaxation that leverages minimum cost perfect matching as a subroutine to efficiently explore the space of planar partitions. We apply our algorithm to the problem of hierarchical image segmentation.

constraint, graph, segmentation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.36)

Add feedback

Matrix Completion with Noisy Side Information ∗ ∗ University of Texas at Austin

Neural Information Processing SystemsMar-12-2024, 20:28:45 GMT

We study the matrix completion problem with side information. Side information has been considered in several matrix completion applications, and has been empirically shown to be useful in many cases. Recently, researchers studied the effect of side information for matrix completion from a theoretical viewpoint,showing that sample complexity can be significantly reduced given completely clean features. However, since in reality most given features are noisy or only weakly informative, the development of a model to handle a general feature set, and investigation of how much noisy features can help matrix recovery, remains an important issue. In this paper, we propose a novel model that balances between features and observations simultaneously in order to leverage feature information yet be robust to feature noise. Moreover, we study the effect of general features in theory and show that by using our model, the sample complexity can be lower than matrix completion as long as features are sufficiently informative.

complexity, information, matrix completion, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.40)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > California (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Differentially Private Subspace Clustering

Neural Information Processing SystemsMar-12-2024, 20:28:26 GMT

Subspace clustering is an unsupervised learning problem that aims at grouping data points into multiple "clusters" so that data points in a single cluster lie approximately on a low-dimensional linear subspace. It is originally motivated by 3D motion segmentation in computer vision, but has recently been generically applied to a wide range of statistical machine learning problems, which often involves sensitive datasets about human subjects. This raises a dire concern for data privacy. In this work, we build on the framework of differential privacy and present two provably private subspace clustering algorithms. We demonstrate via both theory and experiments that one of the presented methods enjoys formal privacy and utility guarantees; the other one asymptotically preserves differential privacy while having good performance in practice. Along the course of the proof, we also obtain two new provable guarantees for the agnostic subspace clustering and the graph connectivity problem which might be of independent interests.

artificial intelligence, machine learning, subspace, (15 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Montana (0.04)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Focused Education > Special Education (0.44)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

General Tensor Spectral Co-clustering for Higher-Order Data

Neural Information Processing SystemsMar-12-2024, 20:00:18 GMT

Spectral clustering and co-clustering are well-known techniques in data analysis, and recent work has extended spectral clustering to square, symmetric tensors and hypermatrices derived from a network. We develop a new tensor spectral co-clustering method that simultaneously clusters the rows, columns, and slices of a nonnegative three-mode tensor and generalizes to tensors with any number of modes. The algorithm is based on a new random walk model which we call the super-spacey random surfer. We show that our method out-performs state-of-the-art co-clustering methods on several synthetic datasets with ground truth clusters and then use the algorithm to analyze several real-world datasets.

markov chain, random walk, tensor, (14 more...)

Neural Information Processing Systems

Country:

Europe > Germany (0.28)
Oceania (0.04)
North America > United States > New York (0.04)
(7 more...)

Genre: Research Report (0.68)

Industry: Transportation > Air (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Add feedback

Clustering with Bregman Divergences: an Asymptotic Analysis

Neural Information Processing SystemsMar-12-2024, 17:14:54 GMT

Clustering, in particular k-means clustering, is a central topic in data analysis. Clustering with Bregman divergences is a recently proposed generalization of k-means clustering which has already been widely used in applications. In this paper we analyze theoretical properties of Bregman clustering when the number of the clusters k is large. We establish quantization rates and describe the limiting distribution of the centers as k, extending well-known results for k-means clustering.

bregman divergence, quantization, quantization error, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Ohio (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

On Robustness of Kernel Clustering

Neural Information Processing SystemsMar-12-2024, 16:43:47 GMT

Clustering is an important unsupervised learning problem in machine learning and statistics. Among many existing algorithms, kernel k-means has drawn much research attention due to its ability to find non-linear cluster boundaries and its inherent simplicity. There are two main approaches for kernel k-means: SVD of the kernel matrix and convex relaxations. Despite the attention kernel clustering has received both from theoretical and applied quarters, not much is known about robustness of the methods. In this paper we first introduce a semidefinite programming relaxation for the kernel clustering problem, then prove that under a suitable model specification, both K-SVD and SDP approaches are consistent in the limit, albeit SDP is strongly consistent, i.e. achieves exact recovery, whereas K-SVD is weakly consistent, i.e. the fraction of misclassified nodes vanish. Also the error bounds suggest that SDP is more resilient towards outliers, which we also demonstrate with experiments.

eigenvector, matrix, outlier, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Statistical Inference for Cluster Trees

Neural Information Processing SystemsMar-12-2024, 15:59:03 GMT

A cluster tree provides a highly-interpretable summary of a density function by representing the hierarchy of its high-density clusters. It is estimated using the empirical tree, which is the cluster tree constructed from a density estimator. This paper addresses the basic question of quantifying our uncertainty by assessing the statistical significance of topological features of an empirical cluster tree. We first study a variety of metrics that can be used to compare different trees, analyze their properties and assess their suitability for inference. We then propose methods to construct and summarize confidence sets for the unknown true cluster tree. We introduce a partial ordering on cluster trees which we use to prune some of the statistically insignificant features of the empirical tree, yielding interpretable and parsimonious cluster trees. Finally, we illustrate the proposed methods on a variety of synthetic examples and furthermore demonstrate their utility in the analysis of a Graft-versus-Host Disease (GvHD) data set.

cluster tree, distortion metric, inference, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Optimal Cluster Recovery in the Labeled Stochastic Block Model

Neural Information Processing SystemsMar-12-2024, 15:45:49 GMT

We consider the problem of community detection or clustering in the labeled Stochastic Block Model (LSBM) with a finite number K of clusters of sizes linearly growing with the global population of items n. Every pair of items is labeled independently at random, and label l appears with probability p(i, j, l) between two items in clusters indexed by i and j, respectively. The objective is to reconstruct the clusters from the observation of these random labels. Clustering under the SBM and their extensions has attracted much attention recently. Most existing work aimed at characterizing the set of parameters such that it is possible to infer clusters either positively correlated with the true clusters, or with a vanishing proportion of misclassified items, or exactly matching the true clusters. We find the set of parameters such that there exists a clustering algorithm with at most s misclassified items in average under the general LSBM and for any s = o(n), which solves one open problem raised in [2]. We further develop an algorithm, based on simple spectral methods, that achieves this fundamental performance limit within O(npolylog(n)) computations and without the a-priori knowledge of the model parameters.

algorithm, high probability, recovery, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.36)

Add feedback

Graphons, mergeons, and so on!

Neural Information Processing SystemsMar-12-2024, 15:43:45 GMT

In this work we develop a theory of hierarchical clustering for graphs. Our modeling assumption is that graphs are sampled from a graphon, which is a powerful and general model for generating graphs and analyzing large networks. Graphons are a far richer class of graph models than stochastic blockmodels, the primary setting for recent progress in the statistical theory of graph clustering. We define what it means for an algorithm to produce the "correct" clustering, give sufficient conditions in which a method is statistically consistent, and provide an explicit algorithm satisfying these properties.

cluster tree, graph, graphon, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Ohio (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.91)

Add feedback

Learning Deep Parsimonious Representations Renjie Liao 1, Richard S. Zemel

Neural Information Processing SystemsMar-12-2024, 15:43:16 GMT

In this paper we aim at facilitating generalization for deep networks while supporting interpretability of the learned representations. Towards this goal, we propose a clustering based regularization that encourages parsimonious representations. Our k-means style objective is easy to optimize and flexible, supporting various forms of clustering, such as sample clustering, spatial clustering, as well as co-clustering. We demonstrate the effectiveness of our approach on the tasks of unsupervised learning, classification, fine grained categorization, and zero-shot learning.

cluster center, neural network, representation, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Illinois (0.04)
North America > United States > California (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback