Goto

Collaborating Authors

Error Rate Bounds in Crowdsourcing Models

arXiv.org Machine Learning

Crowdsourcing is an effective tool for human-powered computation on many tasks challenging for computers. In this paper, we provide finite-sample exponential bounds on the error rate (in probability and in expectation) of hyperplane binary labeling rules under the Dawid-Skene crowdsourcing model. The bounds can be applied to analyze many common prediction methods, including the majority voting and weighted majority voting. These bound results could be useful for controlling the error rate and designing better algorithms. We show that the oracle Maximum A Posterior (MAP) rule approximately optimizes our upper bound on the mean error rate for any hyperplane binary labeling rule, and propose a simple data-driven weighted majority voting (WMV) rule (called one-step WMV) that attempts to approximate the oracle MAP and has a provable theoretical guarantee on the error rate. Moreover, we use simulated and real data to demonstrate that the data-driven EM-MAP rule is a good approximation to the oracle MAP rule, and to demonstrate that the mean error rate of the data-driven EM-MAP rule is also bounded by the mean error rate bound of the oracle MAP rule with estimated parameters plugging into the bound.


Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing

arXiv.org Machine Learning

Crowdsourcing has become an effective and popular tool for human-powered computation to label large datasets. Since the workers can be unreliable, it is common in crowdsourcing to assign multiple workers to one task, and to aggregate the labels in order to obtain results of high quality. In this paper, we provide finite-sample exponential bounds on the error rate (in probability and in expectation) of general aggregation rules under the Dawid-Skene crowdsourcing model. The bounds are derived for multi-class labeling, and can be used to analyze many aggregation methods, including majority voting, weighted majority voting and the oracle Maximum A Posteriori (MAP) rule. We show that the oracle MAP rule approximately optimizes our upper bound on the mean error rate of weighted majority voting in certain setting. We propose an iterative weighted majority voting (IWMV) method that optimizes the error rate bound and approximates the oracle MAP rule. Its one step version has a provable theoretical guarantee on the error rate. The IWMV method is intuitive and computationally simple. Experimental results on simulated and real data show that IWMV performs at least on par with the state-of-the-art methods, and it has a much lower computational cost (around one hundred times faster) than the state-of-the-art methods.


Minimax Optimal Convergence Rates for Estimating Ground Truth from Crowdsourced Labels

arXiv.org Machine Learning

Crowdsourcing has become a primary means for label collection in many real-world machine learning applications. A classical method for inferring the true labels from the noisy labels provided by crowdsourcing workers is Dawid-Skene estimator. In this paper, we prove convergence rates of a projected EM algorithm for the Dawid-Skene estimator. The revealed exponent in the rate of convergence is shown to be optimal via a lower bound argument. Our work resolves the long standing issue of whether Dawid-Skene estimator has sound theoretical guarantees besides its good performance observed in practice. In addition, a comparative study with majority voting illustrates both advantages and pitfalls of the Dawid-Skene estimator.


Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing

Neural Information Processing Systems

The Dawid-Skene estimator has been widely used for inferring the true labels from the noisy labels provided by non-expert crowdsourcing workers. However, since the estimator maximizes a non-convex log-likelihood function, it is hard to theoretically justify its performance. In this paper, we propose a two-stage efficient algorithm for multi-class crowd labeling problems. The first stage uses the spectral method to obtain an initial estimate of parameters. Then the second stage refines the estimation by optimizing the objective function of the Dawid-Skene estimator via the EM algorithm. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor. We conduct extensive experiments on synthetic and real datasets. Experimental results demonstrate that the proposed algorithm is comparable to the most accurate empirical approach, while outperforming several other recently proposed methods.


Reducing Crowdsourcing to Graphon Estimation, Statistically

arXiv.org Machine Learning

Inferring the correct answers to binary tasks based on multiple noisy answers in an unsupervised manner has emerged as the canonical question for micro-task crowdsourcing or more generally aggregating opinions. In graphon estimation, one is interested in estimating edge intensities or probabilities between nodes using a single snapshot of a graph realization. In the recent literature, there has been exciting development within both of these topics. In the context of crowdsourcing, the key intellectual challenge is to understand whether a given task can be more accurately denoised by aggregating answers collected from other different tasks. In the context of graphon estimation, precise information limits and estimation algorithms remain of interest. In this paper, we utilize a statistical reduction from crowdsourcing to graphon estimation to advance the state-of-art for both of these challenges. We use concepts from graphon estimation to design an algorithm that achieves better performance than the {\em majority voting} scheme for a setup that goes beyond the {\em rank one} models considered in the literature. We use known explicit lower bounds for crowdsourcing to provide refined lower bounds for graphon estimation.