Clustering
Learning Prototype Models for Tangent Distance
Simard, LeCun & Denker (1993) showed that the performance of nearest-neighbor classification schemes for handwritten character recognition can be improved by incorporating invariance to spe(cid:173) the so cific transformations in the underlying distance metric - called tangent distance. The resulting classifier, however, can be prohibitively slow and memory intensive due to the large amount of prototypes that need to be stored and used in the distance compar(cid:173) isons. In this paper we develop rich models for representing large subsets of the prototypes. These models are either used singly per class, or as basic building blocks in conjunction with the K-means clustering algorithm.
Multidimensional Scaling and Data Clustering
Visualizing and structuring pairwise dissimilarity data are difficult combinatorial op(cid:173) timization problems known as multidimensional scaling or pairwise data clustering. Algorithms for embedding dissimilarity data set in a Euclidian space, for clustering these data and for actively selecting data to support the clustering process are discussed in the maximum entropy framework. Active data selection provides a strategy to discover structure in a data set efficiently with partially unknown data.
Clustering data through an analogy to the Potts model
A new approach for clustering is proposed. This method is based on an analogy to a physical model; the ferromagnetic Potts model at thermal equilibrium is used as an analog computer for this hard optimization problem . We do not assume any structure of the un(cid:173) derlying distribution of the data. Phase space of the Potts model is divided into three regions; ferromagnetic, super-paramagnetic and paramagnetic phases. The region of interest is that corresponding to the super-paramagnetic one, where domains of aligned spins ap(cid:173) pear.
Limitations of Self-organizing Maps for Vector Quantization and Multidimensional Scaling
The limitations of using self-organizing maps (SaM) for either clustering/vector quantization (VQ) or multidimensional scaling (MDS) are being discussed by reviewing recent empirical findings and the relevant theory. SaM's remaining ability of doing both VQ and MDS at the same time is challenged by a new combined tech(cid:173) nique of online K-means clustering plus Sammon mapping of the cluster centroids. SaM are shown to perform significantly worse in terms of quantization error, in recovering the structure of the clus(cid:173) ters and in preserving the topology in a comprehensive empirical study using a series of multivariate normal clustering problems.
Active Data Clustering
Active data clustering is a novel technique for clustering of proxim(cid:173) ity data which utilizes principles from sequential experiment design in order to interleave data generation and data analysis. The pro(cid:173) posed active data sampling strategy is based on the expected value of information, a concept rooting in statistical decision theory. This is considered to be an important step towards the analysis of large(cid:173) scale data sets, because it offers a way to overcome the inherent data sparseness of proximity data.
A MCMC Approach to Hierarchical Mixture Modelling
There are many hierarchical clustering algorithms available, but these lack a firm statistical basis. Here we set up a hierarchical probabilistic mixture model, where data is generated in a hierarchical tree-structured manner. Markov chain Monte Carlo (MCMC) methods are demonstrated which can be used to sample from the posterior distribution over trees containing variable numbers of hidden units.
Data Clustering by Markovian Relaxation and the Information Bottleneck Method
We introduce a new, non-parametric and principled, distance based clustering method. This method combines a pairwise based ap(cid:173) proach with a vector-quantization method which provide a mean(cid:173) ingful interpretation to the resulting clusters. The idea is based on turning the distance matrix into a Markov process and then examine the decay of mutual-information during the relaxation of this process. The clusters emerge as quasi-stable structures dur(cid:173) ing this relaxation, and then are extracted using the information bottleneck method. The method can cluster data with no geometric or other bias and makes no assumption about the underlying distribution.
An Efficient Clustering Algorithm Using Stochastic Association Model and Its Implementation Using Nanostructures
This paper describes a clustering algorithm for vector quantizers using a "stochastic association model". It offers a new simple and powerful soft- max adaptation rule. The adaptation process is the same as the on-line K-means clustering method except for adding random fluctuation in the distortion error evaluation process. Simulation results demonstrate that the new algorithm can achieve efficient adaptation as high as the "neural gas" algorithm, which is reported as one of the most efficient clustering methods. It is a key to add uncorrelated random fluctuation in the simi- larity evaluation process for each reference vector.
Spectral Relaxation for K-means Clustering
The popular K-means clustering partitions a data set by minimiz(cid:173) ing a sum-of-squares cost function. A coordinate descend method is then used to find local minima. In this paper we show that the minimization can be reformulated as a trace maximization problem associated with the Gram matrix of the data vectors. Furthermore, we show that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by com(cid:173) puting a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by comput(cid:173) ing a pivoted QR decomposition of the eigenvector matrix. As a by-product we also derive a lower bound for the minimum of the sum-of-squares cost function.
On Spectral Clustering: Analysis and an algorithm
Despite many empirical successes of spectral clustering methods(cid:173) algorithms that cluster points using eigenvectors of matrices de(cid:173) rived from the data- there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well.