Clustering
Discriminative Clustering by Regularized Information Maximization
Krause, Andreas, Perona, Pietro, Gomes, Ryan G.
Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive information-theoretic objective function which balances class separation, class balance and classifier complexity. The approach can flexibly incorporate different likelihood functions, express prior assumptions about the relative size of different classes and incorporate partial labels for semi-supervised learning. In particular, we instantiate the framework to unsupervised, multi-class kernelized logistic regression. Our empirical evaluation indicates that RIM outperforms existing methods on several real data sets, and demonstrates that RIM is an effective model selection method.
Random Projections for $k$-means Clustering
Boutsidis, Christos, Zouzias, Anastasios, Drineas, Petros
This paper discusses the topic of dimensionality reduction for $k$-means clustering. We prove that any set of $n$ points in $d$ dimensions (rows in a matrix $A \in \RR^{n \times d}$) can be projected into $t = \Omega(k / \eps^2)$ dimensions, for any $\eps \in (0,1/3)$, in $O(n d \lceil \eps^{-2} k/ \log(d) \rceil )$ time, such that with constant probability the optimal $k$-partition of the point set is preserved within a factor of $2+\eps$. The projection is done by post-multiplying $A$ with a $d \times t$ random matrix $R$ having entries $+1/\sqrt{t}$ or $-1/\sqrt{t}$ with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.
Supervised Clustering
Awasthi, Pranjal, Zadeh, Reza B.
Despite the ubiquity of clustering as a tool in unsupervised learning, there is not yet a consensus on a formal theory, and the vast majority of work in this direction has focused on unsupervised clustering. We study a recently proposed framework for supervised clustering where there is access to a teacher. We give an improved generic algorithm to cluster any concept class in that model. Our algorithm is query-efficient in the sense that it involves only a small amount of interaction with the teacher. We also present and study two natural generalizations of the model. The model assumes that the teacher response to the algorithm is perfect. We eliminate this limitation by proposing a noisy model and give an algorithm for clustering the class of intervals in this noisy model. We also propose a dynamic model where the teacher sees a random subset of the points. Finally, for datasets satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class of clustering functions containing Single-Linkage will find the target clustering under the strongest property.
Towards Property-Based Classification of Clustering Paradigms
Ackerman, Margareta, Ben-David, Shai, Loker, David
Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a โcorrectโ clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm. Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. In this paper we address the major research challenge of developing tools for helping users make more informed decisions when they come to pick a clustering tool for their data. This is, of course, a very ambitious endeavor, and in this paper, we make some first steps towards this goal. We propose to address this problem by distilling abstract properties of the input-output behavior of different clustering paradigms. In this paper, we demonstrate how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms. On top of addressing deterministic clustering algorithms, we also propose similar properties for randomized algorithms and use them to highlight functional differences between different common implementations of k-means clustering. We also study relationships between the properties, independent of any particular algorithm. In particular, we strengthen Kleinbergs famous impossibility result, while providing a simpler proof.
Classifying Clustering Schemes
Carlsson, Gunnar, Memoli, Facundo
Many clustering schemes are defined by optimizing an objective function defined on the partitions of the underlying set of a finite metric space. In this paper, we construct a framework for studying what happens when we instead impose various structural conditions on the clustering schemes, under the general heading of functoriality. Functoriality refers to the idea that one should be able to compare the results of clustering algorithms as one varies the data set, for example by adding points or by applying functions to it. We show that within this framework, one can prove a theorems analogous to one of J. Kleinberg, in which for example one obtains an existence and uniqueness theorem instead of a non-existence result. We obtain a full classification of all clustering schemes satisfying a condition we refer to as excisiveness. The classification can be changed by varying the notion of maps of finite metric spaces. The conditions occur naturally when one considers clustering as the statistical version of the geometric notion of connected components. By varying the degree of functoriality that one requires from the schemes it is possible to construct richer families of clustering schemes that exhibit sensitivity to density.
Random Projections for $k$-means Clustering
Boutsidis, Christos, Zouzias, Anastasios, Drineas, Petros
This paper discusses the topic of dimensionality reduction for $k$-means clustering. We prove that any set of $n$ points in $d$ dimensions (rows in a matrix $A \in \RR^{n \times d}$) can be projected into $t = \Omega(k / \eps^2)$ dimensions, for any $\eps \in (0,1/3)$, in $O(n d \lceil \eps^{-2} k/ \log(d) \rceil )$ time, such that with constant probability the optimal $k$-partition of the point set is preserved within a factor of $2+\eps$. The projection is done by post-multiplying $A$ with a $d \times t$ random matrix $R$ having entries $+1/\sqrt{t}$ or $-1/\sqrt{t}$ with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.
Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback
While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the author's mood, gender, age, or sentiment. Without knowing the user's intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address the problem of clustering documents along the user-desired dimension, previous work has focused on learning a similarity metric from data manually annotated with the user's intention or having a human construct a feature space in an interactive manner during the clustering process. With the goal of reducing reliance on human knowledge for fine-tuning the similarity function or selecting the relevant features required by these approaches, we propose a novel active clustering algorithm, which allows a user to easily select the dimension along which she wants to cluster the documents by inspecting only a small number of words. We demonstrate the viability of our algorithm on a variety of commonly-used sentiment datasets.
Stability of Density-Based Clustering
Rinaldo, Alessandro, Singh, Aarti, Nugent, Rebecca, Wasserman, Larry
High density clusters can be characterized by the connected components of a level set $L(\lambda) = \{x:\ p(x)>\lambda\}$ of the underlying probability density function $p$ generating the data, at some appropriate level $\lambda\geq 0$. The complete hierarchical clustering can be characterized by a cluster tree ${\cal T}= \bigcup_{\lambda} L(\lambda)$. In this paper, we study the behavior of a density level set estimate $\widehat L(\lambda)$ and cluster tree estimate $\widehat{\cal{T}}$ based on a kernel density estimator with kernel bandwidth $h$. We define two notions of instability to measure the variability of $\widehat L(\lambda)$ and $\widehat{\cal{T}}$ as a function of $h$, and investigate the theoretical properties of these instability measures.
Dictionary Optimization for Block-Sparse Representations
Rosenblum, Kevin (Technion - Israel Institute of Technology) | Zelnik-Manor, Lihi (Technion - Israel Institute of Technology) | Eldar, Yonina C. (Technion - Israel Institute of Technology)
Recent work has demonstrated that using a carefully designed dictionary instead of a predefined one, can improve the sparsity in jointly representing a class of signals. This has motivated the derivation of learning methods for designing a dictionary which leads to the sparsest representation for a given set of signals. In some applications, the signals of interest can have further structure, so that they can be well approximated by a union of a small number of subspaces (e.g., face recognition and motion segmentation). This implies the existence of a dictionary which enables block-sparse representations of the input signals once its atoms are properly sorted into blocks. In this paper, we propose an algorithm for learning a block-sparsifying dictionary of a given set of signals. We do not require prior knowledge on the association of signals into groups (subspaces). Instead, we develop a method that automatically detects the underlying block structure. This is achieved by iteratively alternating between updating the block structure of the dictionary and updating the dictionary atoms to better fit the data. Our experiments show that for block-sparse data the proposed algorithm significantly improves the dictionary recovery ability and lowers the representation error compared to dictionary learning methods that do not employ block structure.
Hierarchical Clustering Via Localized Diffusion Folders
David, Gil (Yale University) | Averbuch, Amir (Tel-Aviv University) | Coifman, Ronald R. (Yale University)
Data clustering is a common technique for statistical data analysis. It is used in many fields including machine learning, data mining, customer segmentation, trend analysis, pattern recognition and image analysis. The proposed Localized Diffusion Folders methodology performs hierarchical clustering of high-dimensional datasets. The diffusion folders are multi-level data partitioning into local neighborhoods that are generated by several random selections of data points and folders in a diffusion graph and by defining local diffusion distances between them. This multi-level partitioning defines an improved localized geometry of the data and a localized Markov transition matrix that is used for the next time step in the diffusion process. The result of this clustering method is a bottom-up hierarchical clustering of the data while each level in the hierarchy contains localized diffusion folders of folders from the lower levels. This methodology preserves the local neighborhood of each point while eliminating noisy connections between distinct points and areas in the graph. The performance of the algorithms is demonstrated on real data and it is compared to existing methods.