AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Learning Prototype Models for Tangent Distance

Neural Information Processing SystemsApr-6-2023, 18:36:36 GMT

Simard, LeCun & Denker (1993) showed that the performance of nearest-neighbor classification schemes for handwritten character recognition can be improved by incorporating invariance to spe(cid:173) the so cific transformations in the underlying distance metric - called tangent distance. The resulting classifier, however, can be prohibitively slow and memory intensive due to the large amount of prototypes that need to be stored and used in the distance compar(cid:173) isons. In this paper we develop rich models for representing large subsets of the prototypes. These models are either used singly per class, or as basic building blocks in conjunction with the K-means clustering algorithm.

learning prototype model, tangent distance, trevor hastie

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Multidimensional Scaling and Data Clustering

Neural Information Processing SystemsApr-6-2023, 18:33:27 GMT

Visualizing and structuring pairwise dissimilarity data are difficult combinatorial op(cid:173) timization problems known as multidimensional scaling or pairwise data clustering. Algorithms for embedding dissimilarity data set in a Euclidian space, for clustering these data and for actively selecting data to support the clustering process are discussed in the maximum entropy framework. Active data selection provides a strategy to discover structure in a data set efficiently with partially unknown data.

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Clustering data through an analogy to the Potts model

Neural Information Processing SystemsApr-6-2023, 18:23:26 GMT

A new approach for clustering is proposed. This method is based on an analogy to a physical model; the ferromagnetic Potts model at thermal equilibrium is used as an analog computer for this hard optimization problem . We do not assume any structure of the un(cid:173) derlying distribution of the data. Phase space of the Potts model is divided into three regions; ferromagnetic, super-paramagnetic and paramagnetic phases. The region of interest is that corresponding to the super-paramagnetic one, where domains of aligned spins ap(cid:173) pear.

clustering data, pott model, spin spin correlation function, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.53)

Add feedback

Limitations of Self-organizing Maps for Vector Quantization and Multidimensional Scaling

Neural Information Processing SystemsApr-6-2023, 18:16:20 GMT

The limitations of using self-organizing maps (SaM) for either clustering/vector quantization (VQ) or multidimensional scaling (MDS) are being discussed by reviewing recent empirical findings and the relevant theory. SaM's remaining ability of doing both VQ and MDS at the same time is challenged by a new combined tech(cid:173) nique of online K-means clustering plus Sammon mapping of the cluster centroids. SaM are shown to perform significantly worse in terms of quantization error, in recovering the structure of the clus(cid:173) ters and in preserving the topology in a comprehensive empirical study using a series of multivariate normal clustering problems.

limitation, self-organizing map, vector quantization and multidimensional scaling

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Add feedback

Active Data Clustering

Neural Information Processing SystemsApr-6-2023, 17:57:04 GMT

Active data clustering is a novel technique for clustering of proxim(cid:173) ity data which utilizes principles from sequential experiment design in order to interleave data generation and data analysis. The pro(cid:173) posed active data sampling strategy is based on the expected value of information, a concept rooting in statistical decision theory. This is considered to be an important step towards the analysis of large(cid:173) scale data sets, because it offers a way to overcome the inherent data sparseness of proximity data.

active data clustering, cid

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

A MCMC Approach to Hierarchical Mixture Modelling

Neural Information Processing SystemsApr-6-2023, 17:17:44 GMT

There are many hierarchical clustering algorithms available, but these lack a firm statistical basis. Here we set up a hierarchical probabilistic mixture model, where data is generated in a hierarchical tree-structured manner. Markov chain Monte Carlo (MCMC) methods are demonstrated which can be used to sample from the posterior distribution over trees containing variable numbers of hidden units.

hierarchical mixture modelling, mcmc approach

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.81)

Add feedback

Data Clustering by Markovian Relaxation and the Information Bottleneck Method

Neural Information Processing SystemsApr-6-2023, 17:07:25 GMT

We introduce a new, non-parametric and principled, distance based clustering method. This method combines a pairwise based ap(cid:173) proach with a vector-quantization method which provide a mean(cid:173) ingful interpretation to the resulting clusters. The idea is based on turning the distance matrix into a Markov process and then examine the decay of mutual-information during the relaxation of this process. The clusters emerge as quasi-stable structures dur(cid:173) ing this relaxation, and then are extracted using the information bottleneck method. The method can cluster data with no geometric or other bias and makes no assumption about the underlying distribution.

data clustering, information bottleneck method, markovian relaxation, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.77)

Add feedback

An Efficient Clustering Algorithm Using Stochastic Association Model and Its Implementation Using Nanostructures

Neural Information Processing SystemsApr-6-2023, 16:52:34 GMT

This paper describes a clustering algorithm for vector quantizers using a "stochastic association model". It offers a new simple and powerful soft- max adaptation rule. The adaptation process is the same as the on-line K-means clustering method except for adding random fluctuation in the distortion error evaluation process. Simulation results demonstrate that the new algorithm can achieve efficient adaptation as high as the "neural gas" algorithm, which is reported as one of the most efficient clustering methods. It is a key to add uncorrelated random fluctuation in the simi- larity evaluation process for each reference vector.

efficient clustering algorithm, implementation, stochastic association model, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Spectral Relaxation for K-means Clustering

Neural Information Processing SystemsApr-6-2023, 16:52:03 GMT

The popular K-means clustering partitions a data set by minimiz(cid:173) ing a sum-of-squares cost function. A coordinate descend method is then used to find local minima. In this paper we show that the minimization can be reformulated as a trace maximization problem associated with the Gram matrix of the data vectors. Furthermore, we show that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by com(cid:173) puting a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by comput(cid:173) ing a pivoted QR decomposition of the eigenvector matrix. As a by-product we also derive a lower bound for the minimum of the sum-of-squares cost function.

k-means clustering, spectral relaxation, sum-of-square cost function, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

On Spectral Clustering: Analysis and an algorithm

Neural Information Processing SystemsApr-6-2023, 16:43:45 GMT

Despite many empirical successes of spectral clustering methods(cid:173) algorithms that cluster points using eigenvectors of matrices de(cid:173) rived from the data- there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well.

algorithm, eigenvector, spectral clustering

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback