Goto

Collaborating Authors

 University of Fribourg


How Do Crowdworker Communities and Microtask Markets Influence Each Other? A Data-Driven Study on Amazon Mechanical Turk

AAAI Conferences

Crowdworker online communities — operating in fora like mTurkForum and TurkerNation — are an important actor in microwork markets. Albeit central to market dynamics, how the behavior of crowdworker communities and the dynamics of online marketplaces influence each other is yet to be understood. To provide quantitative evidence of such influence, we performed an analysis on 6-years worth of mTurk market activities and community discussions in six fora. We investigated the nature of the relationships that exist between activities in fora, tasks published in mTurk, requesters for such tasks, and task completion speed. We validate -- and expand upon — results from previous work by showing that (i) there are differences between market demand and community activities that are specific to fora and task types; (ii) the temporal progression of HIT availability in the market is predictive of the upcoming amount of crowdworker discussions, with significant differences across fora and discussion categories; (iii) activities in fora can have a significant positive impact on the completion speed of tasks available in the market.


LearningQ: A Large-Scale Dataset for Educational Question Generation

AAAI Conferences

We present LearningQ, a challenging educational question generation dataset containing over 230K document-question pairs. It includes 7K instructor-designed questions assessing knowledge concepts being taught and 223K learner-generated questions seeking in-depth understanding of the taught concepts. We show that, compared to existing datasets that can be used to generate educational questions, LearningQ (i) covers a wide range of educational topics and (ii) contains long and cognitively demanding documents for which question generation requires reasoning over the relationships between sentences and paragraphs. As a result, a significant percentage of LearningQ questions (~30%) require higher-order cognitive skills to solve (such as applying, analyzing), in contrast to existing question-generation datasets that are designed mostly for the lowest cognitive skill level (i.e. remembering). To understand the effectiveness of existing question generation methods in producing educational questions, we evaluate both rule-based and deep neural network based methods on LearningQ. Extensive experiments show that state-of-the-art methods which perform well on existing datasets cannot generate useful educational questions. This implies that LearningQ is a challenging test bed for the generation of high-quality educational questions and worth further investigation. We open-source the dataset and our codes at https://dataverse.mpi-sws.org/dataverse/icwsm18.


Geographic Differential Privacy for Mobile Crowd Coverage Maximization

AAAI Conferences

For real-world mobile applications such as location-based advertising and spatial crowdsourcing, a key to success is targeting mobile users that can maximally cover certain locations in a future period. To find an optimal group of users, existing methods often require information about users' mobility history, which may cause privacy breaches. In this paper, we propose a method to maximize mobile crowd's future location coverage under a guaranteed location privacy protection scheme. In our approach, users only need to upload one of their frequently visited locations, and more importantly, the uploaded location is obfuscated using a geographic differential privacy policy. We propose both analytic and practical solutions to this problem. Experiments on real user mobility datasets show that our method significantly outperforms the state-of-the-art geographic differential privacy methods by achieving a higher coverage under the same level of privacy protection.


Scaling-Up the Crowd: Micro-Task Pricing Schemes for Worker Retention and Latency Improvement

AAAI Conferences

Retaining workers on micro-task crowdsourcing platforms is essential in order to guarantee the timely completion of batches of Human Intelligence Tasks (HITs). Worker retention is also a necessary condition for the introduction of SLAs on crowdsourcing platforms. In this paper, we introduce novel pricing schemes aimed at improving the retention rate of workers working on long batches of similar tasks. We show how increasing or decreasing the monetary reward over time influences the number of tasks a worker is willing to complete in a batch, as well as how it influences the overall latency. We compare our new pricing schemes against traditional pricing methods (e.g., constant reward for all the HITs in a batch) and empirically show how certain schemes effectively function as an incentive for workers to keep working longer on a given batch of HITs. Our experimental results show that the best pricing scheme in terms of worker retention is based on punctual bonuses paid whenever the workers reach predefined milestones.