Uncertainty
Crowd Labeling: a survey
Muhammadi, Jafar, Rabiee, Hamid Reza, Hosseini, Abbas
Recently, there has been a burst in the number of research projects on human computation via crowdsourcing. Multiple choice (or labeling) questions could be referred to as a common type of problem which is solved by this approach. As an application, crowd labeling is applied to find true labels for large machine learning datasets. Since crowds are not necessarily experts, the labels they provide are rather noisy and erroneous. This challenge is usually resolved by collecting multiple labels for each sample, and then aggregating them to estimate the true label. Although the mechanism leads to high-quality labels, it is not actually cost-effective. As a result, efforts are currently made to maximize the accuracy in estimating true labels, while fixing the number of acquired labels. This paper surveys methods to aggregate redundant crowd labels in order to estimate unknown true labels. It presents a unified statistical latent model where the differences among popular methods in the field correspond to different choices for the parameters of the model. Afterwards, algorithms to make inference on these models will be surveyed. Moreover, adaptive methods which iteratively collect labels based on the previously collected labels and estimated models will be discussed. In addition, this paper compares the distinguished methods, and provides guidelines for future work required to address the current open issues.
A sequential reduction method for inference in generalized linear mixed models
Generalized linear mixed models are a natural and widely used class of models, but one in which the likelihood often involves an integral of very high dimension. Because of this intractability, many alternative methods have been developed for inference in these models. One class of approaches involves replacing the likelihood with some approximation, for example using Laplace's method or importance sampling. However, these approximations can fail in cases where the structure of the model is sparse, in that only a small amount of information is available on each random effect, especially when the data are binary. If there are n random effects in total, the likelihood may always be written as an n-dimensional integral over these random effects. If there are a large number of random effects, then it will be computationally infeasible to obtain an accurate approximation to this n-dimensional integral by direct numerical integration.