We present a clustered personal classifier method (CPC method) that jointly estimates a classifier and clusters of workers in order to address the learning from crowds problem.Crowdsourcing allows us to create a large but low-quality data set at very low cost.The learning from crowds problem is to learn a classifier from such a low-quality data set.From some observations, we notice that workers form clusters according to their abilities.Although such a fact was pointed out several times, no method has applied it to the learning from crowds problem.We propose a CPC method that utilizes the clusters of the workers to improve the performance of the obtained classifier, where both the classifier and the clusters of the workers are estimated.The proposed method has two advantages.One is that it realizes robust estimation of the classifier because it utilizes prior knowledge about the workers that they tend to form clusters.The other is that we can obtain the clusters of the workers, which help us analyze the properties of the workers.Experimental results on synthetic and real data sets indicate that the proposed method can estimate the classifier robustly.In addition, clustering workers is shown to work well. Especially in the real data set, an outlier worker was found by applying the proposed method.
Crowdsourcing services are often used to collect a large amount of labeled data for machine learning. Although they provide us an easy way to get labels at very low cost in a short period, they have serious limitations. One of them is the variable quality of the crowd-generated data. There have been many attempts to increase the reliability of crowd-generated data and the quality of classifiers obtained from such data. However, in these problem settings, relatively few researchers have tried using expert-generated data to achieve further improvements. In this paper, we extend three models that deal with the problem of learning from crowds to utilize ground truths: a latent class model, a personal classifier model, and a data-dependent error model. We evaluate the proposed methods against two baseline methods on a real data set to demonstrate the effectiveness of combining crowd-generated data and expert-generated data.
Although supervised learning requires a labeled dataset, obtaining labels from experts is generally expensive. For this reason, crowdsourcing services are attracting attention in the field of machine learning as a way to collect labels at relatively low cost. However, the labels obtained by crowdsourcing, i.e., from non-expert workers, are often noisy. A number of methods have thus been devised for inferring true labels, and several methods have been proposed for learning classifiers directly from crowdsourced labels, referred to as "learning from crowds." A more practical problem is learning from crowdsourced labeled data and unlabeled data, i.e., "semi-supervised learning from crowds." This paper presents a novel generative model of the labeling process in crowdsourcing. It leverages unlabeled data effectively by introducing latent features and a data distribution. Because the data distribution can be complicated, we use a deep neural network for the data distribution. Therefore, our model can be regarded as a kind of deep generative model. The problems caused by the intractability of latent variable posteriors is solved by introducing an inference model. The experiments show that it outperforms four existing models, including a baseline model, on the MNIST dataset with simulated workers and the Rotten Tomatoes movie review dataset with Amazon Mechanical Turk workers.
We have developed a method for using confidence scores to integrate labels provided by crowdsourcing workers. Although confidence scores can be useful information for estimating the quality of the provided labels, a way to effectively incorporate them into the integration process has not been established. Moreover, some workers are overconfident about the quality of their labels while others are underconfident, and some workers are quite accurate in judging the quality of their labels. This differing reliability of the confidence scores among workers means that the probability distributions for the reported confidence scores differ among workers. To address this problem, we extended the Dawid-Skene model and created two probabilistic models in which the values of unobserved true labels are inferred from the observed provided labels and reported confidence scores by using the expectation-maximization algorithm. Results of experiments using actual crowdsourced data for image labeling and binary question answering tasks showed that incorporating workers' confidence scores can improve the accuracy of integrated crowdsourced labels.
Recently, there has been a burst in the number of research projects on human computation via crowdsourcing. Multiple choice (or labeling) questions could be referred to as a common type of problem which is solved by this approach. As an application, crowd labeling is applied to find true labels for large machine learning datasets. Since crowds are not necessarily experts, the labels they provide are rather noisy and erroneous. This challenge is usually resolved by collecting multiple labels for each sample, and then aggregating them to estimate the true label. Although the mechanism leads to high-quality labels, it is not actually cost-effective. As a result, efforts are currently made to maximize the accuracy in estimating true labels, while fixing the number of acquired labels. This paper surveys methods to aggregate redundant crowd labels in order to estimate unknown true labels. It presents a unified statistical latent model where the differences among popular methods in the field correspond to different choices for the parameters of the model. Afterwards, algorithms to make inference on these models will be surveyed. Moreover, adaptive methods which iteratively collect labels based on the previously collected labels and estimated models will be discussed. In addition, this paper compares the distinguished methods, and provides guidelines for future work required to address the current open issues.