HodgeRank With Information Maximization for Crowdsourced Pairwise Ranking Aggregation

AAAI Conferences

Recently, crowdsourcing has emerged as an effective paradigm for human-powered large scale problem solving in various domains. However, task requester usually has a limited amount of budget, thus it is desirable to have a policy to wisely allocate the budget to achieve better quality. In this paper, we study the principle of information maximization for active sampling strategies in the framework of HodgeRank, an approach based on Hodge Decomposition of pairwise ranking data with multiple workers. The principle exhibits two scenarios of active sampling: Fisher information maximization that leads to unsupervised sampling based on a sequential maximization of graph algebraic connectivity without considering labels; and Bayesian information maximization that selects samples with the largest information gain from prior to posterior, which gives a supervised sampling involving the labels collected. Experiments show that the proposed methods boost the sampling efficiency as compared to traditional sampling schemes and are thus valuable to practical crowdsourcing experiments.


HodgeRank with Information Maximization for Crowdsourced Pairwise Ranking Aggregation

arXiv.org Machine Learning

Recently, crowdsourcing has emerged as an effective paradigm for human-powered large scale problem solving in various domains. However, task requester usually has a limited amount of budget, thus it is desirable to have a policy to wisely allocate the budget to achieve better quality. In this paper, we study the principle of information maximization for active sampling strategies in the framework of HodgeRank, an approach based on Hodge Decomposition of pairwise ranking data with multiple workers. The principle exhibits two scenarios of active sampling: Fisher information maximization that leads to unsupervised sampling based on a sequential maximization of graph algebraic connectivity without considering labels; and Bayesian information maximization that selects samples with the largest information gain from prior to posterior, which gives a supervised sampling involving the labels collected. Experiments show that the proposed methods boost the sampling efficiency as compared to traditional sampling schemes and are thus valuable to practical crowdsourcing experiments.


A Bayesian Framework for Robust Reasoning from Sensor Networks

AAAI Conferences

The work described in this paper defines a Bayesian framework to use noisy, but redundant data from multiple sensor streams and incorporate it with the contextual and domain knowledge that is provided by both the physical constraints imposed by the local environment where the sensors are located and by the people that are involved in the surveillance tasks. The paper also presents the preliminary results of applying the Bayesian framework to the people localization problem in indoor environment using a sensor network that consists of video cameras, infrared tag readers and a fingerprint reader.


Bayesian Bias Mitigation for Crowdsourcing

Neural Information Processing Systems

Biased labelers are a systemic problem in crowdsourcing, and a comprehensive toolbox for handling their responses is still being developed. A typical crowdsourcing application can be divided into three steps: data collection, data curation, and learning. At present these steps are often treated separately. We present Bayesian Bias Mitigation for Crowdsourcing (BBMC), a Bayesian model to unify all three. Most data curation methods account for the {\it effects} of labeler bias by modeling all labels as coming from a single latent truth. Our model captures the {\it sources} of bias by describing labelers as influenced by shared random effects. This approach can account for more complex bias patterns that arise in ambiguous or hard labeling tasks and allows us to merge data curation and learning into a single computation. Active learning integrates data collection with learning, but is commonly considered infeasible with Gibbs sampling inference. We propose a general approximation strategy for Markov chains to efficiently quantify the effect of a perturbation on the stationary distribution and specialize this approach to active learning. Experiments show BBMC to outperform many common heuristics.


Modeling a Sensor to Improve its Efficacy

arXiv.org Machine Learning

Robots rely on sensors to provide them with information about their surroundings. However, high-quality sensors can be extremely expensive and cost-prohibitive. Thus many robotic systems must make due with lower-quality sensors. Here we demonstrate via a case study how modeling a sensor can improve its efficacy when employed within a Bayesian inferential framework. As a test bed we employ a robotic arm that is designed to autonomously take its own measurements using an inexpensive LEGO light sensor to estimate the position and radius of a white circle on a black field. The light sensor integrates the light arriving from a spatially distributed region within its field of view weighted by its Spatial Sensitivity Function (SSF). We demonstrate that by incorporating an accurate model of the light sensor SSF into the likelihood function of a Bayesian inference engine, an autonomous system can make improved inferences about its surroundings. The method presented here is data-based, fairly general, and made with plug-and play in mind so that it could be implemented in similar problems.