AITopics | noisy oracle

Clustering with Noisy Queries

Neural Information Processing SystemsMar-17-2026, 17:49:11 GMT

In this paper, we provide a rigorous theoretical study of clustering with noisy queries. Given a set of $n$ elements, our goal is to recover the true clustering by asking minimum number of pairwise queries to an oracle. Oracle can answer queries of the form ``do elements $u$ and $v$ belong to the same cluster?''-the

artificial intelligence, machine learning, neural information processing system 30, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

Clustering with Noisy Queries

Neural Information Processing SystemsNov-21-2025, 16:07:42 GMT

In this paper, we provide a rigorous theoretical study of clustering with noisy queries. Given a set of $n$ elements, our goal is to recover the true clustering by asking minimum number of pairwise queries to an oracle. Oracle can answer queries of the form ``do elements $u$ and $v$ belong to the same cluster?''-the

clustering, name change, query, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

Clustering with Noisy Queries

Arya Mazumdar, Barna Saha

Neural Information Processing SystemsNov-21-2025, 13:11:57 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, social media, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Michigan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution Ian Covert

Neural Information Processing SystemsOct-9-2025, 17:52:51 GMT

We therefore explore training amortized models with noisy labels, and we find that this is inexpensive and surprisingly effective.

amortization, dataset, feature attribution, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)
Europe > France (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Query-Efficient Correlation Clustering with Noisy Oracle

Neural Information Processing SystemsMay-27-2025, 21:55:27 GMT

We study a general clustering setting in which we have n elements to be clustered, and we aim to perform as few queries as possible to an oracle that returns a noisy sample of the weighted similarity between two elements. Our setting encompasses many application domains in which the similarity function is costly to compute and inherently noisy. We introduce two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB): fixed confidence and fixed budget settings. For both settings, we design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation clustering and study their theoretical guarantees. Our results are the first examples of polynomial-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard.

algorithm, noisy oracle, query-efficient correlation clustering, (1 more...)

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.86)

Add feedback

Clustering with Noisy Queries

Arya Mazumdar, Barna Saha

Neural Information Processing SystemsOct-4-2024, 07:37:34 GMT

In this paper, we provide a rigorous theoretical study of clustering with noisy queries. Given a set of n elements, our goal is to recover the true clustering by asking minimum number of pairwise queries to an oracle. Oracle can answer queries of the form "do elements u and v belong to the same cluster?"-the

algorithm, probability, query, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Michigan (0.04)
(2 more...)

Technology:

Information Technology > Communications > Social Media (0.95)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution

Covert, Ian, Kim, Chanwoo, Lee, Su-In, Zou, James, Hashimoto, Tatsunori

arXiv.org Artificial IntelligenceJan-28-2024

Many tasks in explainable machine learning, such as data valuation and feature attribution, perform expensive computation for each data point and can be intractable for large datasets. These methods require efficient approximations, and learning a network that directly predicts the desired output, which is commonly known as amortization, is a promising solution. However, training such models with exact labels is often intractable; we therefore explore training with noisy labels and find that this is inexpensive and surprisingly effective. Through theoretical analysis of the label noise and experiments with various models and datasets, we show that this approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.

amortization, dataset, feature attribution, (12 more...)

arXiv.org Artificial Intelligence

2401.15866

Country:

Asia > Middle East > Jordan (0.04)
Europe > Italy > Marche > Ancona Province > Ancona (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Denmark (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Clustering with Noisy Queries

Mazumdar, Arya, Saha, Barna

Neural Information Processing SystemsFeb-14-2020, 17:58:07 GMT

In this paper, we provide a rigorous theoretical study of clustering with noisy queries. Given a set of $n$ elements, our goal is to recover the true clustering by asking minimum number of pairwise queries to an oracle. Oracle can answer queries of the form do elements $u$ and $v$ belong to the same cluster?''-the In this paper, we provide the first information theoretic lower bound on the number of queries for clustering with noisy oracle in both situations. We design novel algorithms that closely match this query complexity lower bound, even when the number of clusters is unknown.

noisy oracle, noisy query, query, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.44)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

Learning in Confusion: Batch Active Learning with Noisy Oracle

Gupta, Gaurav, Sahu, Anit Kumar, Lin, Wan-Yi

arXiv.org Machine LearningSep-26-2019

We study the problem of training machine learning models incrementally using active learning with access to imperfect or noisy oracles. We specifically consider the setting of batch active learning, in which multiple samples are selected as opposed to a single sample as in classical settings so as to reduce the training overhead. Our approach bridges between uniform randomness and score based importance sampling of clusters when selecting a batch of new samples. Experiments on benchmark image classification datasets (MNIST, SVHN, and CIFAR10) shows improvement over existing active learning strategies. We introduce an extra denoising layer to deep networks to make active learning robust to label noises and show significant improvements.

active learning, algorithm, learning, (12 more...)

arXiv.org Machine Learning

1909.12473

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Clustering with Noisy Queries

Mazumdar, Arya, Saha, Barna

Neural Information Processing SystemsDec-31-2017

In this paper, we provide a rigorous theoretical study of clustering with noisy queries. Given a set of $n$ elements, our goal is to recover the true clustering by asking minimum number of pairwise queries to an oracle. Oracle can answer queries of the form ``do elements $u$ and $v$ belong to the same cluster?''-the queries can be asked interactively (adaptive queries), or non-adaptively up-front, but its answer can be erroneous with probability $p$. In this paper, we provide the first information theoretic lower bound on the number of queries for clustering with noisy oracle in both situations. We design novel algorithms that closely match this query complexity lower bound, even when the number of clusters is unknown. Moreover, we design computationally efficient algorithms both for the adaptive and non-adaptive settings. The problem captures/generalizes multiple application scenarios. It is directly motivated by the growing body of work that use crowdsourcing for {\em entity resolution}, a fundamental and challenging data mining task aimed to identify all records in a database referring to the same entity. Here crowd represents the noisy oracle, and the number of queries directly relates to the cost of crowdsourcing. Another application comes from the problem of sign edge prediction in social network, where social interactions can be both positive and negative, and one must identify the sign of all pair-wise interactions by querying a few pairs. Furthermore, clustering with noisy oracle is intimately connected to correlation clustering, leading to improvement therein. Finally, it introduces a new direction of study in the popular stochastic block model where one has an incomplete stochastic block model matrix to recover the clusters.

algorithm, probability, query, (15 more...)

Neural Information Processing Systems

Country: