Clustering
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper looks at the problem of combining clustering and outlier detection. It is very well written and easy to read. The authors reuse an earlier facilities location without outliers formulation by Charikar et' al and their main contribution is the solution to the problem formulation. The FLO problem was shown to be intractable by the authors of that paper and no approximation algorithm exists that is both i) scalable and ii) comes with guarantees.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
"NIPS Neural Information Processing Systems 8-11th December 2014, Montreal, Canada",,, "Paper ID:","1612" "Title:","Improved Distributed Principal Component Analysis" Current Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper considers the problem of trading off the communication and computation cost of distributed computation and proposes a new distributed k L-2 error fitting algorithm. The proposed algorithm can be seen as a combination of many previous speed up techniques for distributed PCA and clustering methods. However, the authors also contribute optimizations over the base methods and further improves the communication and computation efficiency. The theoretical guarantee is sound and experiments are convincing.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes a new pairwise clustering framework where nonparametric pairwise similarity is derived by minimizing the generalization error unsupervised nonparametric classifier. The proposed framework bridges the gap between clustering and multi-class classification, and explains the widely used kernel similarity for clustering. The authors also prove that the generalization error bound for the unsupervised plug-in classifier is asymptotically equal to the weighted volume of cluster boundary for low density separation. Based on the derived nonparametric pairwise similarity using the plug-in classifier, the authors propose a new nonparametric exemplar-based clustering method with enhanced discriminative capability compared to the exiting exemplar-based clustering methods.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors propose a novel approach for hierarchical clustering of multivariate data. They construct cluster trees by estimating minimum volume sets using the q-One-Class SVM, and evaluate their method on a synthetic data set and two real word applications. While their new method seems to perform better than other approaches based on density estimation, I am not convinced by the benefits in practical applicability as the authors did not compare their method to the most commonly used hierarchical clustering techniques (agglomerative clustering with average linkage/ward). Minor comment: Rather than splitting their data once in a training and test set, the authors should perform 10-fold/5-fold cross-validation for a more reliable estimation of the generalizability of their method.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper presents a spectral clustering algorithm for the sparse degree regime (degree=\theta(1)) of the stochastic blockmodel. The authors propose an alternative to the non-backtracking operator and point out that this has similar properties as the non-backtracking operator, which has been shown to be useful in the sparse regime. But the proposed data matrix is smaller than the non backtracking operator, and symmetric, therefore making eigenvalue computation easier and more accurate. The sparse regime is indeed the hardest in terms of showing concentration of empirical eigenvalues, or performance of a clustering algorithm in the stochastic blockmodel.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper studies a planted partition model for random m-uniform hypergraphs, and proves the consistency of a natural generalization of spectral clustering. The hypergraph adjacency tensor is (mode-1) flattened to a matrix, from which a normalized Laplacian matrix is formed and the standard spectral partitioning is then applied. The striking feature of the analysis is that the rate of convergence improves as m increases, provided that the number of partitions is small. Some experiments on both synthetic and application derived data are reported, and the proposed method is shown to be relatively effective, especially given its simplicity. The model is well-motivated by applications in computer vision and likely elsewhere.