Reputation-based Worker Filtering in Crowdsourcing

Neural Information Processing Systems

In this paper, we study the problem of aggregating noisy labels from crowd workers to infer the underlying true labels of binary tasks. Unlike most prior work which has examined this problem under the random worker paradigm, we consider a much broader class of {\em adversarial} workers with no specific assumptions on their labeling strategy. Our key contribution is the design of a computationally efficient reputation algorithm to identify and filter out these adversarial workers in crowdsourcing systems. Our algorithm uses the concept of optimal semi-matchings in conjunction with worker penalties based on label disagreements, to assign a reputation score for every worker. We provide strong theoretical guarantees for deterministic adversarial strategies as well as the extreme case of {\em sophisticated} adversaries where we analyze the worst-case behavior of our algorithm.

Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization

AAAI Conferences

In crowdsourced relevance judging, each crowd workertypically judges only a small number of examples,yielding a sparse and imbalanced set of judgments inwhich relatively few workers influence output consensuslabels, particularly with simple consensus methodslike majority voting. We show how probabilistic matrixfactorization, a standard approach in collaborative filtering,can be used to infer missing worker judgments suchthat all workers influence output labels. Given completeworker judgments inferred by PMF, we evaluate impactin unsupervised and supervised scenarios. In thesupervised case, we consider both weighted voting andworker selection strategies based on worker accuracy.Experiments on a synthetic data set and a real turk dataset with crowd judgments from the 2010 TREC RelevanceFeedback Track show promise of the PMF approachmerits further investigation and analysis.

Cheaper and Better: Selecting Good Workers for Crowdsourcing Machine Learning

Crowdsourcing provides a popular paradigm for data collection at scale. We study the problem of selecting subsets of workers from a given worker pool to maximize the accuracy under a budget constraint. One natural question is whether we should hire as many workers as the budget allows, or restrict on a small number of top-quality workers. By theoretically analyzing the error rate of a typical setting in crowdsourcing, we frame the worker selection problem into a combinatorial optimization problem and propose an algorithm to solve it efficiently. Empirical results on both simulated and real-world datasets show that our algorithm is able to select a small number of high-quality workers, and performs as good as, sometimes even better than, the much larger crowds as the budget allows.

Sentiment Analysis via Deep Hybrid Textual-Crowd Learning Model

AAAI Conferences

Crowdsourcing technique provides an efficient platform to employ human skills in sentiment analysis, which is a difficult task for automatic language models due to the large variations in context, writing style, view point and so on. However, the standard crowdsourcing aggregation models are incompetent when the number of crowd labels per worker is not sufficient to train parameters, or when it is not feasible to collect labels for each sample in a large dataset. In this paper, we propose a novel hybrid model to exploit both crowd and text data for sentiment analysis, consisting of a generative crowdsourcing aggregation model and a deep sentimental autoencoder. Combination of these two sub-models is obtained based on a probabilistic framework rather than a heuristic way. We introduce a unified objective function to incorporate the objectives of both sub-models, and derive an efficient optimization algorithm to jointly solve the corresponding problem. Experimental results indicate that our model achieves superior results in comparison with the state-of-the-art models, especially when the crowd labels are scarce.

Multi-Prototype Label Ranking with Novel Pairwise-to-Total-Rank Aggregation

AAAI Conferences

We propose a multi-prototype-based algorithm for online learning of soft pairwise-preferences over labels. The algorithm learns soft label preferences via minimization of the proposed soft rank-loss measure, and can learn from total orders as well as from various types of partial orders. The soft pairwise preference algorithm outputs are further aggregated to produce a total label ranking prediction using a novel aggregation algorithm that outperforms existing aggregation solutions. Experiments on synthetic and real-world data demonstrate state-of-the-art performance of the proposed model.