mturk
Quality Assured: Rethinking Annotation Strategies in Imaging AI
Rädsch, Tim, Reinke, Annika, Weru, Vivienn, Tizabi, Minu D., Heller, Nicholas, Isensee, Fabian, Kopp-Schneider, Annette, Maier-Hein, Lena
This paper does not describe a novel method. Instead, it studies an essential foundation for reliable benchmarking and ultimately real-world application of AI-based image analysis: generating high-quality reference annotations. Previous research has focused on crowdsourcing as a means of outsourcing annotations. However, little attention has so far been given to annotation companies, specifically regarding their internal quality assurance (QA) processes. Therefore, our aim is to evaluate the influence of QA employed by annotation companies on annotation quality and devise methodologies for maximizing data annotation efficacy. Based on a total of 57,648 instance segmented images obtained from a total of 924 annotators and 34 QA workers from four annotation companies and Amazon Mechanical Turk (MTurk), we derived the following insights: (1) Annotation companies perform better both in terms of quantity and quality compared to the widely used platform MTurk. (2) Annotation companies' internal QA only provides marginal improvements, if any. However, improving labeling instructions instead of investing in QA can substantially boost annotation performance. (3) The benefit of internal QA depends on specific image characteristics. Our work could enable researchers to derive substantially more value from a fixed annotation budget and change the way annotation companies conduct internal QA.
ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks
Gilardi, Fabrizio, Alizadeh, Meysam, Kubli, Maël
Published in the Proceedings of the National Academy of Sciences https://www.pnas.org/doi/10.1073/pnas.2305016120 Many NLP applications require manual text annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by crowd-workers on platforms such as MTurk as well as trained annotators, such as research assistants. Using four samples of tweets and news articles (n = 6,183), we show that ChatGPT outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frame detection. Across the four datasets, the zero-shot accuracy of ChatGPT exceeds that of crowd-workers by about 25 percentage points on average, while ChatGPT's intercoder agreement exceeds that of both crowd-workers and trained annotators for all tasks. Moreover, the per-annotation cost of ChatGPT is less than $0.003--about thirty times cheaper than MTurk. These results demonstrate the potential of large language models to drastically increase the efficiency of text classification. 1 Introduction Many NLP applications require high-quality labeled data, notably to train classifiers or evaluate the performance of unsupervised models. For example, researchers often aim to filter noisy social media data for relevance, assign texts to different topics or conceptual categories, or measure their sentiment or stance.
How AI and crowdsourcing help social scientists sample diverse populations
Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. In 2010, three psychologists from the University of British Columbia published a paper with an intriguing title: The WEIRDest people in the world? Paradoxically, the paper was about Americans. The three scientists had devoted their research careers to cross-cultural variability of human psychology and traveled the seven seas to study small-scale tribal societies. In the paper, they voiced a growing concern about how heavily the humanities -- psychology, economics, sociology, political science and others -- were relying on samples of Americans.
Amazon Mechanical Turk - Wikipedia
Amazon Mechanical Turk (MTurk) is a crowdsourcing website for businesses (known as Requesters) to hire remotely located "crowdworkers" to perform discrete on-demand tasks that computers are currently unable to do. It is operated under Amazon Web Services, and is owned by Amazon.[1] Employers post jobs known as Human Intelligence Tasks (HITs), such as identifying specific content in an image or video, writing product descriptions, or answering questions, among others. Workers, colloquially known as Turkers or crowdworkers, browse among existing jobs and complete them in exchange for a rate set by the employer. To place jobs, the requesting programs use an open application programming interface (API), or the more limited MTurk Requester site.[2] As of April 2019, Requesters could register from only 49 approved countries.[3]
Micro-employment - trend created by large-scale automation
In recent years, a new trend has emerged on the labor market – micro-employment, making money on small jobs, for which you only need a laptop and access to the Internet. Platforms for customers and performers advertise their service as a convenient and easy way to make money. But often micro-employment is a lack of choice, low earnings, monotonous and ambiguous tasks. When shoppers in London's Hackney area shop at the new Amazon Fresh store, they no longer pay the checkout operator, but simply walk out with their wares. Amazon describes it as an effortless consumer experience. The rise in automated stores during the pandemic is just the tip of the iceberg.
Big tech's push for automation hides the grim reality of 'microwork' Phil Jones
When customers in the London borough of Hackney shop in the new Amazon Fresh store, they no longer pay a checkout operator but simply walk out with their goods. Amazon describes "just walk out shopping" as an effortless consumer experience. The rise of automated stores during the pandemic is just the tip of the iceberg. Floor-cleaning robots have been introduced in hospitals, supermarkets and schools. Fast-food restaurants are employing burger-grilling robots and chatbots.
AI: Ghost workers demand to be seen and heard
Artificial intelligence and machine learning exist on the back of a lot of hard work from humans. Alongside the scientists, there are thousands of low-paid workers whose job it is to classify and label data - the lifeblood of such systems. But increasingly there are questions about whether these so-called ghost workers are being exploited. As we train the machines to become more human, are we actually making the humans work more like machines? And what role do these workers play in shaping the AI systems that are increasingly controlling every aspect of our lives?
5 Best Data Collection Companies for Machine Learning Projects
Data is the bedrock of all machine learning systems. As such, working with the right data collection company is critical in order to solve a supervised machine learning problem. If you don't have a particular goal or project in mind, there is a wealth of open data available on the web to practice with. However, if you're looking to tackle a specific problem, chances are you'll need to collect data yourself or work with a company that can collect data for you. There are many data collection companies that provide crowdsourcing services to help individuals and corporations gather data at scale.
When AI needs a human assistant
For years, Amazon's Mechanical Turk (mTurk) has been a kind of open secret in the tech world, a place where fledgling algorithms can hire human labor on the cheap. If you need a hundred people to trace the boundaries of an object or fill out a survey, it's the single best place to make it happen. But while the project itself is well-known, it's always slightly embarrassing when a company turns up there. In 2017, Expensify was spotted asking mTurk workers to enter data from receipts, leading the company to rush out a statement insisting that the mTurk project had nothing to do with Expensify's main app. In part, it was a privacy issue, but mostly it was embarrassing: Expensify was built on a simple piece of technology -- the ability to extract data from a photo of a receipt -- and the mTurk tasks made it look like that technology was a sham. What if it was human beings extracting that data all along?
Comparison-Based Framework for Psychophysics: Lab versus Crowdsourcing
Haghiri, Siavash, Wichmann, Felix, von Luxburg, Ulrike
Traditionally, psychophysical experiments are conducted by repeated measurements on a few well-trained participants under well-controlled conditions, often resulting in, if done properly, high quality data. In recent years, however, crowdsourcing platforms are becoming increasingly popular means of data collection, measuring many participants at the potential cost of obtaining data of worse quality. In this paper we study whether the use of comparison-based (ordinal) data, combined with machine learning algorithms, can boost the reliability of crowdsourcing studies for psychophysics, such that they can achieve performance close to a lab experiment. To this end, we compare three setups: simulations, a psychophysics lab experiment, and the same experiment on Amazon Mechanical Turk. All these experiments are conducted in a comparison-based setting where participants have to answer triplet questions of the form "is object x closer to y or to z?". We then use machine learning to solve the triplet prediction problem: given a subset of triplet questions with corresponding answers, we predict the answer to the remaining questions. Considering the limitations and noise on MTurk, we find that the accuracy of triplet prediction is surprisingly close---but not equal---to our lab study.