Asia
Adaptive Performance Optimization over Crowd Labor Channels
Karanam, Saraschandra (Xerox Research Centre-India) | Chander, Deepthi (Xerox Research Centre-India) | Celis, Elisa Laura (Ecole Polytechnique Federale de Lausanne (EPFL)) | Dasgupta, Koustuv (Xerox Research Centre-India) | Rajan, Vaibhav (Xerox Research Centre-India)
Parallel Task Routing for Crowdsourcing
Bragg, Jonathan (University of Washington) | Kolobov, Andrey (Microsoft Research) | Mausam, Mausam (Indian Institute of Technology, Delhi) | Weld, Daniel S. (University of Washington)
An ideal crowdsourcing or citizen-science system would route tasks to the most appropriate workers, but the best assignment is unclear because workers have varying skill, tasks have varying difficulty, and assigning several workers to a single task may significantly improve output quality. This paper defines a space of task routing problems, proves that even the simplest is NP-hard, and develops several approximation algorithms for parallel routing problems. We show that an intuitive class of requesters' utility functions is submodular, which lets us provide iterative methods for dynamically allocating batches of tasks that make near-optimal use of available workers in each round. Experiments with live oDesk workers show that our task routing algorithm uses only 48% of the human labor compared to the commonly used round-robin strategy. Further, we provide versions of our task routing algorithm which enable it to scale to large numbers of workers and questions and to handle workers with variable response times while still providing significant benefit over common baselines.
Speech Synthesis Data Collection for Visually Impaired Person
Ashikawa, Masayuki (Toshiba Corporation) | Kawamura, Takahiro (Toshiba Corporation) | Ohsuga, Akihiko (The University of Electro-Communications)
Crowdsourcing platforms provide attractive solutions for collecting speech synthesis data for visually impaired person. However, quality control problems remain because of low-quality volunteer workers. In this paper, we propose the design of a crowdsourcing system that allows us to devise quality control methods. We introduce four worker selection methods; preprocessing filtering, real-time filtering, post-processing filtering, and guess-processing filtering. These methods include a novel approach that utilizes a collaborative filtering technique in addition to a basic approach involving initial training or use of gold-standard data. These quality control methods improved the quality of collected speech synthesis data. Moreover, we have already collected 140,000 Japanese words from 500 million web data for speech synthesis data.
Instance-Privacy Preserving Crowdsourcing
Kajino, Hiroshi (The University of Tokyo) | Baba, Yukino (National Institute of Informatics) | Kashima, Hisashi (Kyoto University)
Crowdsourcing is a technique to outsource tasks to a number of workers. Although crowdsourcing has many advantages, it gives rise to the risk that sensitive information may be leaked, which has limited the spread of its popularity. Task instances (data workers receive to process tasks) often contain sensitive information, which can be extracted by workers. For example, in an audio transcription task, an audio file corresponds to an instance, and the content of the audio (e.g., the abstract of a meeting) can be sensitive information. In this paper, we propose a quantitative analysis framework for the instance privacy problem. The proposed framework supplies us performance measures of instance privacy preserving protocols. As a case study, we apply the proposed framework to an instance clipping protocol and analyze the properties of the protocol. The protocol preserves privacy by clipping instances to limit the amount of information workers obtain. The results show that the protocol can balance task performance and instance privacy preservation. They also show that the proposed measure is consistent with standard measures, which validates the proposed measure.
TRACCS: A Framework for Trajectory-Aware Coordinated Urban Crowd-Sourcing
Chen, Cen (Singapore Management University) | Cheng, Shih-Fen (Singapore Management University) | Gunawan, Aldy (Singapore Management University) | Misra, Archan (Singapore Management University) | Dasgupta, Koustuv (Xerox Research Centre India) | Chander, Deepthi (Xerox Research Centre India)
We investigate the problem of large-scale mobile crowd-tasking, where a large pool of citizen crowd-workers are used to perform a variety of location-specific urban logistics tasks. Current approaches to such mobile crowd-tasking are very decentralized: a crowd-tasking platform usually provides each worker a set of available tasks close to the worker's current location; each worker then independently chooses which tasks she wants to accept and perform. In contrast, we propose TRACCS, a more coordinated task assignment approach, where the crowd-tasking platform assigns a sequence of tasks to each worker, taking into account their expected location trajectory over a wider time horizon, as opposed to just instantaneous location. We formulate such task assignment as an optimization problem, that seeks to maximize the total payoff from all assigned tasks, subject to a maximum bound on the detour (from the expected path) that a worker will experience to complete her assigned tasks. We develop credible computationally-efficient heuristics to address this optimization problem (whose exact solution requires solving a complex integer linear program), and show, via simulations with realistic topologies and commuting patterns, that a specific heuristic (called Greedy-ILS) increases the fraction of assigned tasks by more than 20%, and reduces the average detour overhead by more than 60%, compared to the current decentralized approach.
To Re(label), or Not To Re(label)
Lin, Christopher H. (University of Washington) | Mausam, . (Indian Institute of Technology, Delhi) | Weld, Daniel S (University of Washington)
One of the most popular uses of crowdsourcing is to provide training data for supervised machine learning algorithms. Since human annotators often make errors, requesters commonly ask multiple workers to label each example. ย But is this strategy always the most cost effective use of crowdsourced workers? We argue "No" --- often classifiers can achieve higher accuracies when trained with noisy "unilabeled" data. However, in some cases relabeling is extremely important. ย We discuss three factors that may make relabeling an effective strategy: classifier expressiveness, worker accuracy, and budget.
CrowdUtility: A Recommendation System for Crowdsourcing Platforms
Chander, Deepthi (Xerox Research Center India) | Bhattacharya, Sakyajit (Xerox Research Centre India) | Celis, Elisa (EPFL Lausanne) | Dasgupta, Koustuv (Xerox Research Centre India) | Karanam, Saraschandra (Xerox Research Centre India) | Rajan, Vaibhav (Xerox Research Centre India) | Gupta, Avantika (Xerox Research Centre India)
Crowd workers exhibit varying work patterns, expertise, and quality leading to wide variability in the performance of crowdsourcing platforms. The onus of choosing a suitable platform to post tasks is mostly with the requester, often leading to poor guarantees and unmet requirements due to the dynamism in performance of crowd platforms. Towards this end, we demonstrate CrowdUtility, a statistical modelling based tool for evaluating multiple crowdsourcing platforms and recommending a platform that best suits the requirements of the requester. CrowdUtility uses an online Multi-Armed Bandit framework, to schedule tasks while optimizing platform performance. We demonstrate an end-to end system starting from requirements specification, to platform recommendation, to real-time monitoring.
Quality Control for Crowdsourced Enumeration Tasks
Kajimura, Shunsuke (The University of Tokyo) | Baba, Yukino (National Institute of Informatics) | Kajino, Hiroshi (The University of Tokyo) | Kashima, Hisashi (Kyoto University)
Quality control is one of the central issues in crowdsourcing research. In this paper, we consider a quality control problem of crowdsourced enumeration tasks that request workers to enumerate possible answers as many as possible. Since workers neither necessarily provide correct answers nor provide exactly the same answers even if the answers indicate the same idea, we propose a two-stage quality control method consisting of the answer clustering stage and the reliability estimation stage.
Learning Pronunciation and Accent from The Crowd
Liu, Frederick (National Taiwan University) | Yang, Jeremy Chiaming (National Taiwan University) | Hsu, Jane Yung-jen (National Taiwan University)
Learning a second language is becoming a more popular trend around the world. But the act of learning another language in a place removed from native speakers is difficult as there is often no one to correct mistakes nor examples to imitate. With the idea of crowd sourcing, we would like to propose an efficient way to learn a second language better.
A Markov Decision Process Framework for Predictable Job Completion Times on Crowdsourcing Platforms
Lakshminarayanan, Chandrashekar (Indian Institute of Science) | Dubey, Ayush (Indian Institute of Science) | Bhatnagar, Shalabh (Indian Institute of Science) | Balamurugan, Chithralekha (Xerox Research Centre India)
Task starvation leads to huge variation in the completion times of the tasks posted on to the crowd. The price offered to a given task together with the dynamics of the crowd at the time of posting affect its completion time. Large organizations/requesters who frequent the crowd at regular intervals in order to get their tasks done desire predictability in completion times of the tasks. Thus, such requesters have to take into account the crowd dynamics at the time of posting the tasks and price them accordingly. In this work, we study an instance of the pricing problem and propose a solution based on the framework of Markov Decision Processes (MDPs).