Query Complexity of Clustering with Side Information

Oct-2-2024, 15:35:36 GMT–Neural Information Processing Systems

Suppose, we are given a set of n elements to be clustered into k (unknown) clusters, and an oracle/expert labeler that can interactively answer pair-wise queries of the form, "do two elements u and v belong to the same cluster?". The goal is to recover the optimum clustering by asking the minimum number of queries. In this paper, we provide a rigorous theoretical study of this basic problem of query complexity of interactive clustering, and give strong information theoretic lower bounds, as well as nearly matching upper bounds. Most clustering problems come with a similarity matrix, which is used by an automated process to cluster similar points together. However, obtaining an ideal similarity function is extremely challenging due to ambiguity in data representation, poor data quality etc., and this is one of the primary reasons that makes clustering hard.

algorithm, query, side information, (13 more...)

Neural Information Processing Systems

Oct-2-2024, 15:35:36 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - Nevada > Clark County
    - Las Vegas (0.05)
  - Massachusetts > Hampshire County
    - Amherst (0.14)
  - California
    - Los Angeles County > Long Beach (0.04)
    - Alameda County > Berkeley (0.04)
- Europe > France
  - Île-de-France > Paris > Paris (0.04)
- Asia > South Korea
  - Seoul > Seoul (0.04)

Technology:
- Information Technology
  - Data Science (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language > Information Retrieval
      - Query Processing (0.62)
    - Machine Learning > Statistical Learning
      - Clustering (0.48)

Duplicate Docs Excel Report

Title
Query Complexity of Clustering with Side Information
Query Complexity of Clustering with Side Information

Similar Docs Excel Report more

Title	Similarity	Source
None found