Goto

Collaborating Authors

 same-cluster query



Optimal Algorithms for Learning Partitions with Faulty Oracles

Neural Information Processing Systems

This models applications where learners crowdsource information from non-expert human workers or conduct noisy experiments to determine group structure. The learner aims to exactly recover a partition by submitting queries of the form "are




Exact Recovery of Mangled Clusters with Same-Cluster Queries

Neural Information Processing Systems

We study the cluster recovery problem in the semi-supervised active clustering framework. Given a finite set of input points, and an oracle revealing whether any two points lie in the same cluster, our goal is to recover all clusters exactly using as few queries as possible. To this end, we relax the spherical $k$-means cluster assumption of Ashtiani et al.\ to allow for arbitrary ellipsoidal clusters with margin. This removes the assumption that the clustering is center-based (i.e., defined through an optimization problem), and includes all those cases where spherical clusters are individually transformed by any combination of rotations, axis scalings, and point deletions. We show that, even in this much more general setting, it is still possible to recover the latent clustering exactly using a number of queries that scales only logarithmically with the number of input points. More precisely, we design an algorithm that, given $n$ points to be partitioned into $k$ clusters, uses $O(k^3 \ln k \ln n)$ oracle queries and $\widetilde{O}(kn + k^3)$ time to recover the clustering with zero misclassification error. The $O(\cdot)$ notation hides an exponential dependence on the dimensionality of the clusters, which we show to be necessary thus characterizing the query complexity of the problem. Our algorithm is simple, easy to implement, and can also learn the clusters using low-stretch separators, a class of ellipsoids with additional theoretical guarantees. Experiments on large synthetic datasets confirm that we can reconstruct clusterings exactly and efficiently.


Clustering with Same-Cluster Queries

Neural Information Processing Systems

We propose a framework for Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to interact with a domain expert, asking whether two given instances belong to the same cluster or not. We study the query and computational complexity of clustering in this framework. We consider a setting where the expert conforms to a center-based clustering with a notion of margin. We show that there is a trade off between computational complexity and query complexity; We prove that for the case of $k$-means clustering (i.e., when the expert conforms to a solution of $k$-means), having access to relatively few such queries allows efficient solutions to otherwise NP hard problems. In particular, we provide a probabilistic polynomial-time (BPP) algorithm for clustering in this setting that asks $O\big(k^2\log k + k\log n)$ same-cluster queries and runs with time complexity $O\big(kn\log n)$ (where $k$ is the number of clusters and $n$ is the number of instances). The success of the algorithm is guaranteed for data satisfying the margin condition under which, without queries, we show that the problem is NP hard. We also prove a lower bound on the number of queries needed to have a computationally efficient clustering algorithm in this setting.


Clustering with Same-Cluster Queries

Hassan Ashtiani, Shrinu Kushagra, Shai Ben-David

Neural Information Processing Systems

Clustering is a challenging task particularly due to two impediments. The first problem is that clustering, in the absence of domain knowledge, is usually an under-specified task; the solution of choice may vary significantly between different intended applications.




Exact Recovery of Mangled Clusters with Same-Cluster Queries (supplementary material)

Neural Information Processing Systems

We recall Y ao's minimax principle for Monte Carlo algorithms. Then (see [4], Proposition 2.6): max Lemma 8 (20) In the rest of the proof we prove the four lemmata. Let z be the point w.r.t. which the margin of C holds. The proof of the theorem is complete. In this section we show how to compute the separator of Theorem 3. In fact, computing the separator (see Section 5).