Goto

Collaborating Authors

 crowdsourcer



Fundamental Limits of Budget-Fidelity Trade-off in Label Crowdsourcing

Neural Information Processing Systems

Digital crowdsourcing (CS) is a modern approach to perform certain large projects using small contributions of a large crowd. In CS, a taskmaster typically breaks down the project into small batches of tasks and assigns them to so-called workers with imperfect skill levels. The crowdsourcer then collects and analyzes the results for inference and serving the purpose of the project. In this work, the CS problem, as a human-in-the-loop computation problem, is modeled and analyzed in an information theoretic rate-distortion framework. The purpose is to identify the ultimate fidelity that one can achieve by any form of query from the crowd and any decoding (inference) algorithm with a given budget. The results are established by a joint source channel (de)coding scheme, which represent the query scheme and inference, over parallel noisy channels, which model workers with imperfect skill levels. We also present and analyze a query scheme dubbed k-ary incidence coding and study optimized query pricing in this setting.


Towards AI-Empowered Crowdsourcing

arXiv.org Artificial Intelligence

Crowdsourcing, in which human intelligence and productivity is dynamically mobilized to tackle tasks too complex for automation alone to handle, has grown to be an important research topic and inspired new businesses (e.g., Uber, Airbnb). Over the years, crowdsourcing has morphed from providing a platform where workers and tasks can be matched up manually into one which leverages data-driven algorithmic management approaches powered by artificial intelligence (AI) to achieve increasingly sophisticated optimization objectives. In this paper, we provide a survey presenting a unique systematic overview on how AI can empower crowdsourcing to improve its efficiency - which we refer to as AI-Empowered Crowdsourcing(AIEC). We propose a taxonomy which divides AIEC into three major areas: 1) task delegation, 2) motivating workers, and 3) quality control, focusing on the major objectives which need to be accomplished. We discuss the limitations and insights, and curate the challenges of doing research in each of these areas to highlight promising future research directions.


Efficient crowdsourcing of crowd-generated microtasks

arXiv.org Machine Learning

Allowing members of the crowd to propose novel microtasks for one another is an effective way to combine the efficiencies of traditional microtask work with the inventiveness and hypothesis generation potential of human workers. However, microtask proposal leads to a growing set of tasks that may overwhelm limited crowdsourcer resources. Crowdsourcers can employ methods to utilize their resources efficiently, but algorithmic approaches to efficient crowdsourcing generally require a fixed task set of known size. In this paper, we introduce *cost forecasting* as a means for a crowdsourcer to use efficient crowdsourcing algorithms with a growing set of microtasks. Cost forecasting allows the crowdsourcer to decide between eliciting new tasks from the crowd or receiving responses to existing tasks based on whether or not new tasks will cost less to complete than existing tasks, efficiently balancing resources as crowdsourcing occurs. Experiments with real and synthetic crowdsourcing data show that cost forecasting leads to improved accuracy. Accuracy and efficiency gains for crowd-generated microtasks hold the promise to further leverage the creativity and wisdom of the crowd, with applications such as generating more informative and diverse training data for machine learning applications and improving the performance of user-generated content and question-answering platforms.


Accurate inference of crowdsourcing properties when using efficient allocation strategies

arXiv.org Machine Learning

Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual tasks accurately. However, these algorithms introduce bias by preferentially allocating workers onto easy tasks, leading to sets of completed tasks that are no longer representative of all tasks. This bias challenges inference of problem-wide properties such as typical task difficulty or crowd properties such as worker completion times, important information that goes beyond the crowd responses themselves. Here we study inference about problem properties when using an allocation algorithm to improve crowd efficiency. We introduce Decision-Explicit Probability Sampling (DEPS), a method to perform inference of problem properties while accounting for the potential bias introduced by an allocation strategy. Experiments on real and synthetic crowdsourcing data show that DEPS outperforms baseline inference methods while still leveraging the efficiency gains of the allocation method. The ability to perform accurate inference of general properties when using non-representative data allows crowdsourcers to extract more knowledge out of a given crowdsourced dataset.


Crowd ideation of supervised learning problems

arXiv.org Artificial Intelligence

Crowdsourcing is an important avenue for collecting machine learning data, but crowdsourcing can go beyond simple data collection by employing the creativity and wisdom of crowd workers. Yet crowd participants are unlikely to be experts in statistics or predictive modeling, and it is not clear how well non-experts can contribute creatively to the process of machine learning. Here we study an end-to-end crowdsourcing algorithm where groups of non-expert workers propose supervised learning problems, rank and categorize those problems, and then provide data to train predictive models on those problems. Problem proposal includes and extends feature engineering because workers propose the entire problem, not only the input features but also the target variable. We show that workers without machine learning experience can collectively construct useful datasets and that predictive models can be learned on these datasets. In our experiments, the problems proposed by workers covered a broad range of topics, from politics and current events to problems capturing health behavior, demographics, and more. Workers also favored questions showing positively correlated relationships, which has interesting implications given many supervised learning methods perform as well with strong negative correlations. Proper instructions are crucial for non-experts, so we also conducted a randomized trial to understand how different instructions may influence the types of problems proposed by workers. In general, shifting the focus of machine learning tasks from designing and training individual predictive models to problem proposal allows crowdsourcers to design requirements for problems of interest and then guide workers towards contributing to the most suitable problems.


Fundamental Limits of Budget-Fidelity Trade-off in Label Crowdsourcing

Neural Information Processing Systems

Digital crowdsourcing (CS) is a modern approach to perform certain large projects using small contributions of a large crowd. In CS, a taskmaster typically breaks down the project into small batches of tasks and assigns them to so-called workers with imperfect skill levels. The crowdsourcer then collects and analyzes the results for inference and serving the purpose of the project. In this work, the CS problem, as a human-in-the-loop computation problem, is modeled and analyzed in an information theoretic rate-distortion framework. The purpose is to identify the ultimate fidelity that one can achieve by any form of query from the crowd and any decoding (inference) algorithm with a given budget. The results are established by a joint source channel (de)coding scheme, which represent the query scheme and inference, over parallel noisy channels, which model workers with imperfect skill levels. We also present and analyze a query scheme dubbed k-ary incidence coding and study optimized query pricing in this setting.


Infusing Human Factors into Algorithmic Crowdsourcing

AAAI Conferences

The emergence of crowdsourcing systems have provided a viable mechanism for incorporating humans into the computational loop at large scale and in real-time. This offers an unprecedent opportunity to study how artificial intelligence (AI) techniques and humans can collaborate to solve problems. An important challenge in crowdsourcing is how to make optimal use of human resources as people have different skills and their availability may be limited. In this paper, we provide the research community with a new dataset derived from an online game-based platform to address this challenge. Six crowdsourcing task allocation scenarios with different overall workload levels and worker population characteristics were presented to over 400 players to solve. With close to 3,000 game sessions and over 300,000 task allocation decisions from human and AI players, the dataset provides an efficient focal point for the research community to design solutions that can sustainably tap into the pool of human resources through crowdsourcing.