A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness

Shah, Nihar B., Balakrishnan, Sivaraman, Wainwright, Martin J.

Jun-30-2016–arXiv.org Machine Learning

The aggregation and denoising of crowd labeled data is a task that has gained increased significance with the advent of crowdsourcing platforms and massive datasets. In this paper, we propose a permutation-based model for crowd labeled data that is a significant generalization of the common Dawid-Skene model, and introduce a new error metric by which to compare different estimators. Working in a high-dimensional non-asymptotic framework that allows both the number of workers and tasks to scale, we derive optimal rates of convergence for the permutation-based model. We show that the permutation-based model offers significant robustness in estimation due to its richness, while surprisingly incurring only a small additional statistical penalty as compared to the Dawid-Skene model. Finally, we propose a computationally-efficient method, called the OBI-WAN estimator, that is uniformly optimal over a class intermediate between the permutation-based and the Dawid-Skene models, and is uniformly consistent over the entire permutation-based model class. In contrast, the guarantees for estimators available in prior literature are sub-optimal over the original Dawid-Skene model.

artificial intelligence, estimator, machine learning, (20 more...)

arXiv.org Machine Learning

Jun-30-2016

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.92)

Genre:
- Research Report (0.64)
- Workflow (0.46)

Industry:
- Government > Regional Government > North America Government (0.46)

Technology:
- Information Technology
  - Communications > Social Media
    - Crowdsourcing (0.49)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found