Collaborating Authors


Joint Multi-Dimensional Model for Global and Time-Series Annotations Machine Learning

Crowdsourcing is a popular approach to collect annotations for unlabeled data instances. It involves collecting a large number of annotations from several, often naive untrained annotators for each data instance which are then combined to estimate the ground truth. Further, annotations for constructs such as affect are often multi-dimensional with annotators rating multiple dimensions, such as valence and arousal, for each instance. Most annotation fusion schemes however ignore this aspect and model each dimension separately. In this work we address this by proposing a generative model for multi-dimensional annotation fusion, which models the dimensions jointly leading to more accurate ground truth estimates. The model we propose is applicable to both global and time series annotation fusion problems and treats the ground truth as a latent variable distorted by the annotators. The model parameters are estimated using the Expectation-Maximization algorithm and we evaluate its performance using synthetic data and real emotion corpora as well as on an artificial task with human annotations

Reliable Aggregation of Boolean Crowdsourced Tasks

AAAI Conferences

We propose novel algorithms for the problem of crowdsourcing binary labels. Such binary labeling tasks are very common in crowdsourcing platforms, for instance, to judge the appropriateness of web content or to flag vandalism. We propose two unsupervised algorithms: one simple to implement albeit derived heuristically, and one based on iterated bayesian parameter estimation of user reputation models. We provide mathematical insight into the benefits of the proposed algorithms over existing approaches, and we confirm these insights by showing that both algorithms offer improved performance on many occasions across both synthetic and real-world datasets obtained via Amazon Mechanical Turk.