Goto

Collaborating Authors

 Clustering


Mini-Batch Kernel $k$-means

arXiv.org Artificial Intelligence

We present the first mini-batch kernel $k$-means algorithm, offering an order of magnitude improvement in running time compared to the full batch algorithm. A single iteration of our algorithm takes $\widetilde{O}(kb^2)$ time, significantly faster than the $O(n^2)$ time required by the full batch kernel $k$-means, where $n$ is the dataset size and $b$ is the batch size. Extensive experiments demonstrate that our algorithm consistently achieves a 10-100x speedup with minimal loss in quality, addressing the slow runtime that has limited kernel $k$-means adoption in practice. We further complement these results with a theoretical analysis under an early stopping condition, proving that with a batch size of $\widetilde{\Omega}(\max \{\gamma^{4}, \gamma^{2}\} \cdot \epsilon^{-2})$, the algorithm terminates in $O(\gamma^2/\epsilon)$ iterations with high probability, where $\gamma$ bounds the norm of points in feature space and $\epsilon$ is a termination threshold. Our analysis holds for any reasonable center initialization, and when using $k$-means++ initialization, the algorithm achieves an approximation ratio of $O(\log k)$ in expectation. For normalized kernels, such as Gaussian or Laplacian it holds that $\gamma=1$. Taking $\epsilon = O(1)$ and $b=\Theta(\log n)$, the algorithm terminates in $O(1)$ iterations, with each iteration running in $\widetilde{O}(k)$ time.


Reviews: Sparse Embedded k -Means Clustering

Neural Information Processing Systems

The authors proposed a sparse embedded k-means clustering algorithm to improve the running time of matrix multiplication of current proposed randomization projection method under sparse setting of data matrix in the literature. In particular, they demonstrated that their algorithms achieve the calculation time of matrix multiplication of order proportional to the number of non-zeroes entries of data matrix. I think the sparse embedded k-means clustering algorithm is rather interesting; however, it is not a very surprising improvement given the current results from the paper of C. Boutsidis et al. (2015). More specifically, both papers try to approximate low dimension solution for the original solution of K-means problem. To do that, C. Boutsidis et al. (2015) proposed to multiply the data matrix X with a random matrix having entries 1/\sqrt{d'} or -1/\sqrt{d'} where d' is the dimension of the approximation solution.


Reviews: Affinity Clustering: Hierarchical Clustering at Scale

Neural Information Processing Systems

The paper focuses on the development of the field of distributed hierarchical clustering. The authors propose a novel class of algorithms tagged'affinity clustering' that operate on the basis of Boruvka's seminal work on minimal spanning trees and contrast those to linkage clustering algorithms (which are based on Kruskal's work). The authors systematically introduce the theoretical underpinnings of affinity clustering, before proposing'certificates' as a metric to characterise clustering algorithm solutions more generally by assessing the clustered edge weights (cost). Following the theoretical analysis and operationalisation of MapReduce variants of affinity clustering for distributed operation, the quality is assessed empirically using standard datasets with variants of linkage- and affinity-based algorithms, as well as k-means. In addition to the Rand index (as metric for clustering accuracy) the quality of algorithms is assessed based on the ratio of the detected clusters (with balanced cluster sizes considered favourable).


Reviews: Supervising Unsupervised Learning

Neural Information Processing Systems

By considering a probability distribution over a family of supervised datasets, the authors propose to select a clustering algorithm from a finite family of algorithms or to choose the number of clusters among other tasks by solving a supervised learning problem that matches some features of the input dataset to the output dataset. For instance, in the case of selecting the number of clusters, they regress this number from a family of datasets learning a function that gives a "correct" number of clusters. The submission seems technically sound; the authors support the claim of the possibility of agnostic learning in two specific settings with a theoretical analysis: choosing an algorithm from a finite family of algorithms and choosing an algorithm from a family of single-linkage algorithms. Their framework also allows proposing an alternative to the desirable property of Scale-Invariance introduced by Kleinberg (2003) by letting the training datasets to establish a scale; this is translated into the Meta-Scale-Invariance desirable property. The authors then show that, with this version of the Scale-Invariance property, it is possible to learn a clustering algorithm that is also Consistent and Rich (as defined by Kleinberg (2003)).


Reviews: Clustering Redemptionโ€“Beyond the Impossibility of Kleinberg's Axioms

Neural Information Processing Systems

This paper extends Kleinberg's work on the impossibility of clustering. That is, Kleinberg introduced 3 axioms that any clustering procedure should satisfy and then showed that it is impossible to satisfy all three simultaneously. This work suggests a refinement of Kleinberg's 3rd axiom having to do with consistency. In the original axiom, if the clustering distance function is perturbed such that all within-cluster distances do not increase and all across cluster distances do not decrease, then the optimal clustering should be the same with respect to both the perturbed and unperturbed distance functions. The refined consistency axiom proposed in this work says that under the perturbed distance function, the optimal clustering may be different so long as the number of clusters in the optimal partition is different as well.


Reviews: Data-Driven Clustering via Parameterized Lloyd's Families

Neural Information Processing Systems

The paper proposes a generalization of the KMeans Algorithm by introducing non-negative parameters alpha and beta. The motivation is that different instances of clustering problems may cluster well according to different clustering objectives. The optimal parameter configuration of alpha and beta defines an optimal choice, from the proposed family of clustering algorithms. The paper offers several theoretical contributions. It provides guarantees for the number of samples necessary such that the empirically best parameter set yields clustering costs is within epsilon bounds of the optimal parameters. It provides an algorithm for the enumeration of all possible sets of initial centers for any alpha-interval.


Reviews: Query K-means Clustering and the Double Dixie Cup Problem

Neural Information Processing Systems

This paper investigates the problem of active-semi-supervised clustering, by considering both noiseless (perfect oracle) and noisy (imperfect oracle) query responses. The authors provide probabilistic guarantees for low approximation errors to the true optimal k-means objective. The corresponding query complexities are substantially lower than in the existing literature. Importantly, as noted by the authors, their query complexity is independent of the size of the dataset. The main strength of the paper lies in the considerable technical rigour with which the subject has been handled.


Recovery of Coherent Data via Low-Rank Dictionary Pursuit

Neural Information Processing Systems

The recently established RPCA [4] method provides a convenient way to restore low-rank matrices from grossly corrupted observations. While elegant in theory and powerful in reality, RPCA is not an ultimate solution to the low-rank matrix recovery problem. Indeed, its performance may not be perfect even when data are strictly low-rank. This is because RPCA ignores clustering structures of the data which are ubiquitous in applications. As the number of cluster grows, the coherence of data keeps increasing, and accordingly, the recovery performance of RPCA degrades.