Lewenberg, Yoad (The Hebrew University of Jerusalem ) | Bachrach, Yoram (Digital Genius Ltd.) | Paquet, Ulrich (Microsoft Research, Cambridge ) | Rosenschein, Jeffrey S. (The Hebrew University of Jerusalem)
We examine the surveying problem, where we attempt to predict how a target user is likely to respond to questions by iteratively querying that user, collaboratively based on the responses of a sample set of users. We focus on an active learning approach, where the next question we select to ask the user depends on their responses to the previous questions. We propose a method for solving the problem based on a Bayesian dimensionality reduction technique. We empirically evaluate our method, contrasting it to benchmark approaches based on augmented linear regression, and show that it achieves much better predictive performance, and is much more robust when there is missing data.
The questions in a crowdsourcing task typically exhibit varying degrees of difficulty and subjectivity. Their joint effects give rise to the variation in responses to the same question by different crowd-workers. This variation is low when the question is easy to answer and objective, and high when it is difficult and subjective. Unfortunately, current quality control methods for crowdsourcing consider only the question difficulty to account for the variation. As a result,these methods cannot distinguish workers personal preferences for different correct answers of a partially subjective question from their ability/expertise to avoid objectively wrong answers for that question. To address this issue, we present a probabilistic model which (i) explicitly encodes question difficulty as a model parameter and (ii) implicitly encodes question subjectivity via latent preference factors for crowd-workers. We show that question subjectivity induces grouping of crowd-workers, revealed through clustering of their latent preferences. Moreover, we develop a quantitative measure of the subjectivity of a question. Experiments show that our model(1) improves the performance of both quality control for crowd-sourced answers and next answer prediction for crowd-workers,and (2) can potentially provide coherent rankings of questions in terms of their difficulty and subjectivity, so that task providers can refine their designs of the crowdsourcing tasks, e.g. by removing highly subjective questions or inappropriately difficult questions.
Online crowdsourcing provides a scalable and inexpensive means to collect knowledge (e.g. labels) about various types of data items (e.g. text, audio, video). However, it is also known to result in large variance in the quality of recorded responses which often cannot be directly used for training machine learning systems. To resolve this issue, a lot of work has been conducted to control the response quality such that low-quality responses cannot adversely affect the performance of the machine learning systems. Such work is referred to as the quality control for crowdsourcing. Past quality control research can be divided into two major branches: quality control mechanism design and statistical models. The first branch focuses on designing measures, thresholds, interfaces and workflows for payment, gamification, question assignment and other mechanisms that influence workers' behaviour. The second branch focuses on developing statistical models to perform effective aggregation of responses to infer correct responses. The two branches are connected as statistical models (i) provide parameter estimates to support the measure and threshold calculation, and (ii) encode modelling assumptions used to derive (theoretical) performance guarantees for the mechanisms. There are surveys regarding each branch but they lack technical details about the other branch. Our survey is the first to bridge the two branches by providing technical details on how they work together under frameworks that systematically unify crowdsourcing aspects modelled by both of them to determine the response quality. We are also the first to provide taxonomies of quality control papers based on the proposed frameworks. Finally, we specify the current limitations and the corresponding future directions for the quality control research.
Generation and evaluation of crowdsourced content is commonly treated as two separate processes, performed at different times and by two distinct groups of people: content creators and content assessors. As a result, most crowdsourcing tasks follow this template: one group of workers generates content and another group of workers evaluates it. In an educational setting, for example, content creators are traditionally students that submit open-response answers to assignments (e.g., a short answer, a circuit diagram, or a formula) and content assessors are instructors that grade these submissions. Despite the considerable success of peer-grading in massive open online courses (MOOCs), the process of test-taking and grading are still treated as two distinct tasks which typically occur at different times, and require an additional overhead of grader training and incentivization. Inspired by this problem in the context of education, we propose a general crowdsourcing framework that fuses open-response test-taking (content generation) and assessment into a single, streamlined process that appears to students in the form of an explicit test, but where everyone also acts as an implicit grader. The advantages offered by our framework include: a common incentive mechanism for both the creation and evaluation of content, and a probabilistic model that jointly models the processes of contribution and evaluation, facilitating efficient estimation of the quality of the contributions and the competency of the contributors. We demonstrate the effectiveness and limits of our framework via simulations and a real-world user study.
We propose a probabilistic model to aggregate the answers of respondents answering multiple-choice questions. The model does not assume that everyone has access to the same information, and so does not assume that the consensus answer is correct. Instead, it infers the most probable world state, even if only a minority vote for it. Each respondent is modeled as receiving a signal contingent on the actual world state, and as using this signal to both determine their own answer and predict the answers given by others. By incorporating respondent's predictions of others' answers, the model infers latent parameters corresponding to the prior over world states and the probability of different signals being received in all possible world states, including counterfactual ones. Unlike other probabilistic models for aggregation, our model applies to both single and multiple questions, in which case it estimates each respondent's expertise. The model shows good performance, compared to a number of other probabilistic models, on data from seven studies covering different types of expertise.