Resolving the Human-Subjects Status of ML's Crowdworkers
As the focus of machine learning (ML) has shifted toward settings characterized by massive datasets, researchers have become reliant on crowdsourcing platforms.13,25 Just for the natural language processing (NLP) task of passage-based question answering (QA), more than 15 new datasets containing at least 50k annotations have been introduced since 2016. Prior to that, available QA datasets contained orders of magnitude fewer examples. The ability to construct such enormous resources derives mostly from the liquid market for temporary labor on crowdsourcing platforms such as Amazon Mechanical Turk. These practices, however, have raised ethical concerns, including low wages;5,26 disparate access, benefits, and harms of developed applications;1,20 reproducibility of proposed methods;4,21 and potential for unfairness and discrimination in the resulting technologies.9,14
Mar-26-2024, 16:44:51 GMT