rater
- North America > United States > Washington (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- North America > Dominican Republic (0.04)
- Health & Medicine (1.00)
- Education > Educational Setting (0.46)
- Leisure & Entertainment > Games (0.46)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Information Technology > Communications > Mobile (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Human Computer Interaction (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Research Report > Experimental Study (0.98)
- Research Report > New Finding (0.70)
- Personal (0.68)
- Law (1.00)
- Information Technology > Security & Privacy (0.69)
- Asia > India (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States (0.14)
- South America > Brazil (0.04)
- North America > Mexico (0.04)
- (10 more...)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Beyond Top Activations: Efficient and Reliable Crowdsourced Evaluation of Automated Interpretability
Oikarinen, Tuomas, Yan, Ge, Kulkarni, Akshay, Weng, Tsui-Wei
Interpreting individual neurons or directions in activation space is an important topic in mechanistic interpretability. Numerous automated interpretability methods have been proposed to generate such explanations, but it remains unclear how reliable these explanations are, and which methods produce the most accurate descriptions. While crowd-sourced evaluations are commonly used, existing pipelines are noisy, costly, and typically assess only the highest-activating inputs, leading to unreliable results. In this paper, we introduce two techniques to enable cost-effective and accurate crowdsourced evaluation of automated interpretability methods beyond top activating inputs. First, we propose Model-Guided Importance Sampling (MG-IS) to select the most informative inputs to show human raters. In our experiments, we show this reduces the number of inputs needed to reach the same evaluation accuracy by ~13x. Second, we address label noise in crowd-sourced ratings through Bayesian Rating Aggregation (BRAgg), which allows us to reduce the number of ratings per input required to overcome noise by ~3x. Together, these techniques reduce the evaluation cost by ~40x, making large-scale evaluation feasible. Finally, we use our methods to conduct a large scale crowd-sourced study comparing recent automated interpretability methods for vision networks.
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.88)
Stable diffusion models reveal a persisting human and AI gap in visual creativity
Rondini, Silvia, Alvarez-Martin, Claudia, Angermair-Barkai, Paula, Penacchio, Olivier, Paz, M., Pelowski, Matthew, Dediu, Dan, Rodriguez-Fornells, Antoni, Cerda-Company, Xim
While recent research suggests Large Language Models match human creative performance in divergent thinking tasks, visual creativity remains underexplored. This study compared image generation in human participants (Visual Artists and Non Artists) and using an image generation AI model (two prompting conditions with varying human input: high for Human Inspired, low for Self Guided). Human raters (N=255) and GPT4o evaluated the creativity of the resulting images. We found a clear creativity gradient, with Visual Artists being the most creative, followed by Non Artists, then Human Inspired generative AI, and finally Self Guided generative AI. Increased human guidance strongly improved GenAI's creative output, bringing its productions close to those of Non Artists. Notably, human and AI raters also showed vastly different creativity judgment patterns. These results suggest that, in contrast to language centered tasks, GenAI models may face unique challenges in visual domains, where creativity depends on perceptual nuance and contextual sensitivity, distinctly human capacities that may not be readily transferable from language models.
- Europe > Austria > Vienna (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- (12 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Meet the AI workers who tell their friends and family to stay away from AI
AI workers said they distrust the models they work on because of a consistent emphasis on rapid turnaround time at the expense of quality. AI workers said they distrust the models they work on because of a consistent emphasis on rapid turnaround time at the expense of quality. K rista Pawloski remembers the single defining moment that shaped her opinion on the ethics of artificial intelligence . As an AI worker on Amazon Mechanical Turk - a marketplace that allows companies to hire workers to perform tasks like entering data or matching an AI prompt with its output - Pawloski spends her time moderating and assessing the quality of AI-generated text, images and videos, as well as some factchecking. Roughly two years ago, while working from home at her dining room table, she took up a job designating tweets as racist or not. When she was presented with a tweet that read "Listen to that mooncricket sing", she almost clicked on the "no" button before deciding to check the meaning of the word "mooncricket", which, to her surprise, was a racial slur against Black Americans.
- Europe > Ukraine (0.05)
- Oceania > Australia (0.04)
- North America > United States > Montana (0.04)
- (3 more...)
- Leisure & Entertainment > Sports (0.69)
- Law (0.68)
- Government > Regional Government (0.48)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.87)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)