annotator
I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI
For screenwriters like me--and job seekers all over--AI gig work is the new waiting tables. In eight months, I've done 20 of these soul-crushing contracts for five different platforms. My name on the platform is ri611. I work as an AI trainer. I assess whether a chatbot's tone is natural or flat, affected or annoying. I identify patterns in pictures of furniture; search the internet for group photos of strangers whom I'll eliminate from the portrait, one by one. I trawl through bizarre videos so I can annotate and time-stamp the barking of a dog, the moment a stranger walks past a window, the precise millisecond a balloon pops. I generate anime sex scenes and decapitate young women, coax LLMs into giving me recipes for bombs made of household items, and generate invites to a reprise of January 6 at the White House, all as part of a red team whose purpose is to test safety precautions and probe weaknesses. I work for companies with names like Mercor and Outlier and Task-ify and Turing and Handshake and Micro1. In my "other" career, I am a Hollywood writer and showrunner. I create prime-time TV, usually featuring a middle-class white lady having the worst day of her life, with some salt-of-the-earth police interference to raise the stakes. You can find my shows on Paramount and Hulu and the BBC.
- Leisure & Entertainment (1.00)
- Health & Medicine (0.93)
- Media > Television (0.48)
- (2 more...)
From Ground Truth to Measurement: A Statistical Framework for Human Labeling
Chew, Robert, Eckman, Stephanie, Kern, Christoph, Kreuter, Frauke
Supervised machine learning assumes that labeled data provide accurate measurements of the concepts models are meant to learn. Yet in practice, human labeling introduces systematic variation arising from ambiguous items, divergent interpretations, and simple mistakes. Machine learning research commonly treats all disagreement as noise, which obscures these distinctions and limits our understanding of what models actually learn. This paper reframes annotation as a measurement process and introduces a statistical framework for decomposing labeling outcomes into interpretable sources of variation: instance difficulty, annotator bias, situational noise, and relational alignment. The framework extends classical measurement-error models to accommodate both shared and individualized notions of truth, reflecting traditional and human label variation interpretations of error, and provides a diagnostic for assessing which regime better characterizes a given task. Applying the proposed model to a multi-annotator natural language inference dataset, we find empirical evidence for all four theorized components and demonstrate the effectiveness of our approach. We conclude with implications for data-centric machine learning and outline how this approach can guide the development of a more systematic science of labeling.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Maryland (0.04)
- (4 more...)
Data Distribution Valuation Using Generalized Bayesian Inference
Nguyen, Cuong N., Nguyen, Cuong V.
We investigate the data distribution valuation problem, which aims to quantify the values of data distributions from their samples. This is a recently proposed problem that is related to but different from classical data valuation and can be applied to various applications. For this problem, we develop a novel framework called Generalized Bayes Valuation that utilizes generalized Bayesian inference with a loss constructed from transferability measures. This framework allows us to solve, in a unified way, seemingly unrelated practical problems, such as annotator evaluation and data augmentation. Using the Bayesian principles, we further improve and enhance the applicability of our framework by extending it to the continuous data stream setting. Our experiment results confirm the effectiveness and efficiency of our framework in different real-world scenarios.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California (0.04)
- Asia > Singapore (0.04)
- Africa > Middle East > Morocco > Tanger-Tetouan-Al Hoceima Region > Tangier (0.04)
- North America > United States (0.67)
- Asia > China > Beijing > Beijing (0.04)
- Education > Educational Setting > Online (0.67)
- Government > Regional Government > North America Government > United States Government (0.67)
- North America > United States (0.04)
- Asia > China (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (0.94)
- Health & Medicine (0.93)
- Education (0.68)
- North America > United States (0.28)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Germany (0.04)
- (3 more...)
- Workflow (1.00)
- Research Report > New Finding (0.67)
- Health & Medicine > Health Care Providers & Services (0.46)
- Information Technology > Software (0.45)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- (12 more...)
- Overview (1.00)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
- Law (1.00)
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- (2 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > Dominican Republic (0.04)
- (7 more...)
- Law (1.00)
- Information Technology (0.69)
- Government (0.67)