annotator
Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
Wang, Tong, Xu, Yiqing, Yang, Leo Yang
Interpretable text representations should expose coordinates that are not only predictive, but also meaningful enough for independent auditors to apply. Existing discriminative representations often use anonymous embedding directions, while concept-bottleneck and LLM-assisted methods attach natural-language names to features without ensuring that those definitions are reproducible or distinct from the target label. We propose an operational criterion for interpretable discriminative text representations: each coordinate should satisfy conceptual clarity, measured by chance-adjusted agreement between independent annotators applying the feature definition, and label disentanglement, meaning the feature should not merely paraphrase the prediction target. We instantiate this criterion in LLM-assisted Feature Discovery (LFD), an iterative method that proposes lexical and semantic features from contrastive outcome-opposed text pairs, screens candidates using cross-LLM Cohen's $ฮบ$, and selects features by residual held-out predictive gain. A stylized analysis connects the $ฮบ$ screen to a per-feature annotation-noise bound, formalizing agreement as a reliability check. Across ten text-classification tasks spanning seven corpora, LFD matches the predictive performance of a strong text bottleneck baseline while producing substantially clearer and less label-entangled features. Human audits with 232 raters show that LFD features achieve higher human--human and human--LLM agreement than baseline concepts, and raters consistently judge them as less label-leaking. These results suggest that agreement-tested, label-disentangled coordinates provide a practical auditability standard for interpretable text classification.
I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI
For screenwriters like me--and job seekers all over--AI gig work is the new waiting tables. In eight months, I've done 20 of these soul-crushing contracts for five different platforms. My name on the platform is ri611. I work as an AI trainer. I assess whether a chatbot's tone is natural or flat, affected or annoying. I identify patterns in pictures of furniture; search the internet for group photos of strangers whom I'll eliminate from the portrait, one by one. I trawl through bizarre videos so I can annotate and time-stamp the barking of a dog, the moment a stranger walks past a window, the precise millisecond a balloon pops. I generate anime sex scenes and decapitate young women, coax LLMs into giving me recipes for bombs made of household items, and generate invites to a reprise of January 6 at the White House, all as part of a red team whose purpose is to test safety precautions and probe weaknesses. I work for companies with names like Mercor and Outlier and Task-ify and Turing and Handshake and Micro1. In my "other" career, I am a Hollywood writer and showrunner. I create prime-time TV, usually featuring a middle-class white lady having the worst day of her life, with some salt-of-the-earth police interference to raise the stakes. You can find my shows on Paramount and Hulu and the BBC.
5fc47800ee5b30b8777fdd30abcaaf3b-Supplemental-Conference.pdf
Having defined and validated the pairwise feedback simulator and evaluations in AlpacaFarm, we569 now turn our attention to studying methods that learn from pairwise feedback on AlpacaFarm.570 Unfortunately, the lack of existing benchmarks for learning from pairwise feedback for instruction571 following means that there has not been any open study of these methods in the instruction-following572 setting. In the remainder of this section, we will introduce our reference methods, which fall into two575 categories based on whether they fit a surrogate reward model as part of the learning process.576 FeedME is a method proposed by OpenAI [45] that incorporates human feedback578 with supervised fine-tuning on model generations that are rated 7/7 by human labelers. We adapt579 this approach to the pairwise feedback setting and call this baseline binary FeedME. This approach580 fine-tunes the SFT model on the chosen response in each preference pair with supervised learning.581 Motivated by controllable generation through conditioning [27, 34,582 29, 21], we propose binary reward conditioning, a baseline method that fine-tunes the SFT model583 with the feedback data Dpairwise by conditioning instances with either a positive or negative control584 token. Specifically, for each instance (x,y0,y1,z) 2D pairwise, the string concatenation of instruction585 x and response yz denoted as [x,yz] is prepended with the positive token and used in supervised586 fine-tuning (similarly [x,y1 z]is prepended with the negative token). This process creates a modified587 demonstration dataset that is double the size of Dpairwise. At test time, we draw samples from the588 fine-tuned model conditioned on the positive token.589 A.2 Methods that optimize a surrogate reward function590 We now describe methods that incorporate feedback by first building a surrogate reward model with591 pairwise feedback data. To start, we describe the step of training the surrogate reward model.592 While this can be a powerful approach,596 we will see that it can also lead to over-optimization [19] where models learn to exploit the reward597 model rather than achieve high true reward. We now describe 4 methods that leverage the surrogate598 reward model.599
StoryBench: AMultifaceted Benchmark for Continuous Story Visualization
Generating video stories from text prompts is a complex task. In addition to having high visual quality, videos need to realistically adhere to a sequence of text prompts whilst being consistent throughout the frames. Creating a benchmark for video generation requires data annotated over time, which contrasts with the single caption used often in video datasets. To fill this gap, we collect comprehensive human annotations on three existing datasets, and introduce StoryBench: a new, challenging multi-task benchmark to reliably evaluate forthcoming text-to-video models. Our benchmark includes three video generation tasks of increasing difficulty: action execution, where the next action must be generated starting from a conditioning video; story continuation, where a sequence of actions must be executed starting from a conditioning video; and story generation, where a video must be generated from only text prompts. We evaluate small yet strong text-to-video baselines, and show the benefits of training on story-like data algorithmically generated from existing video captions. Finally, we establish guidelines for human evaluation of video stories, and reaffirm the need of better automatic metrics for video generation. StoryBench aims at encouraging future research efforts in this exciting new area. Work completed during an internship at Google.