contributor
Auditing the Auditors: Does Community-based Moderation Get It Right?
Alimohammadi, Yeganeh, Huang, Karissa, Borgs, Christian, Chayes, Jennifer
Online social platforms increasingly rely on crowd-sourced systems to label misleading content at scale, but these systems must both aggregate users' evaluations and decide whose evaluations to trust. To address the latter, many platforms audit users by rewarding agreement with the final aggregate outcome, a design we term consensus-based auditing. We analyze the consequences of this design in X's Community Notes, which in September 2022 adopted consensus-based auditing that ties users' eligibility for participation to agreement with the eventual platform outcome. We find evidence of strategic conformity: minority contributors' evaluations drift toward the majority and their participation share falls on controversial topics, where independent signals matter most. We formalize this mechanism in a behavioral model in which contributors trade off private beliefs against anticipated penalties for disagreement. Motivated by these findings, we propose a two-stage auditing and aggregation algorithm that weights contributors by the stability of their past residuals rather than by agreement with the majority. The method first accounts for differences across content and contributors, and then measures how predictable each contributor's evaluations are relative to the latent-factor model. Contributors whose evaluations are consistently informative receive greater influence in aggregation, even when they disagree with the prevailing consensus. In the Community Notes data, this approach improves out-of-sample predictive performance while avoiding penalization of disagreement.
- Europe > Ukraine > Kyiv Oblast > Kyiv (0.14)
- Asia > Russia (0.14)
- Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.04)
- (3 more...)
- Health & Medicine (0.93)
- Media > News (0.48)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > California (0.04)
- (2 more...)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > Texas (0.15)
- North America > United States > California (0.14)
- Media > News (1.00)
- Leisure & Entertainment > Sports (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- (4 more...)
Wikipedia's Existential Threats Feel Greater Than Ever
As the free online encyclopedia turns 25, it's facing political opposition, AI scraping, dwindling volunteers, and a public that may no longer believe in its ideals. In 2010, the FBI sent Wikipedia a letter that would be intimidating for any organization to receive. The missive demanded that the free online encyclopedia remove the FBI's logo from an entry about the agency, claiming that reproducing the emblem was illegal and punishable with fines, imprisonment, "or both." Rather than back down, a lawyer for the Wikimedia Foundation, which hosts Wikipedia, shot back a sharp refusal outlining how the FBI's interpretation of the relevant statute was incorrect and saying that Wikipedia was "prepared to argue our view in court." It worked--the FBI dropped the matter.
- Asia > China (0.05)
- South America > Venezuela > Capital District > Caracas (0.04)
- North America > United States > California (0.04)
- (5 more...)
Contributor: Rob Reiner reshaped how California understands and invests in children
Things to Do in L.A. Hollywood director Rob Reiner engineered Proposition 10, a 1998 tobacco tax that created First 5 California, generating more than $11 billion for early childhood programs statewide. This is read by an automated voice. Please report any issues or inconsistencies here . After his tragic death Sunday, the world remembers Rob Reiner as a cinematic force -- and he was one, as an unforgettable presence on the ambitious 1970s sitcom "All in the Family" and later as the director of beloved films. I came to know him differently: as a restless thinker who transformed his own life story into bold public policy, reshaping how California understands and invests in its youngest children.
- North America > United States > California > Los Angeles County > Los Angeles (0.06)
- North America > United States > California > Alameda County (0.05)
- North America > United States > Alabama (0.04)
- (3 more...)
- Media (1.00)
- Law (1.00)
- Education (1.00)
- (2 more...)
Eka-Eval: An Evaluation Framework for Low-Resource Multilingual Large Language Models
Sinha, Samridhi Raj, Sheth, Rajvee, Upperwal, Abhishek, Singh, Mayank
The rapid evolution of Large Language Models' has underscored the need for evaluation frameworks that are globally applicable, flexible, and modular, and that support a wide range of tasks, model types, and linguistic settings. We introduce EKA-EVAL, a unified, end- to-end framework that combines a zero-code web interface and an interactive CLI to ensure broad accessibility. It integrates 50+ multilingual benchmarks across nine evaluation categories, supports local and proprietary models, and provides 11 core capabilities through a modular, plug-and-play architecture. Designed for scalable, multilingual evaluation with support for low-resource multilingual languages, EKA-EVAL is, to the best of our knowledge, the first suite to offer comprehensive coverage in a single platform. Comparisons against five existing baselines indicate improvements of at least 2x better on key usability measures, with the highest user satisfaction, faster setup times, and consistent benchmark reproducibility. The framework is open-source and publicly available at https://github.com/lingo-iitgn/eka-eval.
MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks
Chervyakov, Artem, Kharitonov, Alexander, Zadorozhny, Pavel, Pavel, Adamenko, Levichev, Rodion, Vorobev, Dmitrii, Salikhov, Dmitrii, Valeev, Aidar, Pestova, Alena, Dziuba, Maria, Alimova, Ilseyar, Zavgorodnev, Artem, Medvedev, Aleksandr, Moiseev, Stanislav, Bruches, Elena, Grebenkin, Daniil, Derunets, Roman, Vladimir, Vikulov, Emelyanov, Anton, Babaev, Dmitrii, Ivanov, Vladimir V., Malykh, Valentin, Fenogenova, Alena
Advancements in LLMs have enhanced task automation in software engineering; however, current evaluations primarily focus on natural language tasks, overlooking code quality. Most benchmarks prioritize high-level reasoning over executable code and real-world performance, leaving gaps in understanding true capabilities and risks associated with these models in production. To address this issue, we propose MERA Code, a new addition to the MERA benchmark family, specifically focused on evaluating code for the latest code generation LLMs in Russian. This benchmark includes 11 evaluation tasks that span 8 programming languages. Our proposed evaluation methodology features a taxonomy that outlines the practical coding skills necessary for models to complete these tasks. The benchmark comprises an open-source codebase for users to conduct MERA assessments, a scoring system compatible with various programming environments, and a platform featuring a leaderboard and submission system. We evaluate open LLMs and frontier API models, analyzing their limitations in terms of practical coding tasks in non-English languages. We are publicly releasing MERA to guide future research, anticipate groundbreaking features in model development, and standardize evaluation procedures.
- North America > United States (0.28)
- Europe > Austria (0.28)
- Information Technology > Security & Privacy (1.00)
- Education (1.00)
Trustless Federated Learning at Edge-Scale: A Compositional Architecture for Decentralized, Verifiable, and Incentive-Aligned Coordination
Onobhayedo, Pius, Oamen, Paul Osemudiame
Artificial intelligence is retracing the Internet's path from centralized provision to distributed creation. Initially, resource-intensive computation concentrates within institutions capable of training and serving large models. Eventually, as federated learning matures, billions of edge devices holding sensitive data will be able to collectively improve models without surrendering raw information, enabling both contribution and consumption at scale. This democratic vision remains unrealized due to certain compositional gaps; aggregators handle updates without accountability, economic mechanisms are lacking and even when present remain vulnerable to gaming, coordination serializes state modifications limiting scalability, and governance permits retroactive manipulation. This work addresses these gaps by leveraging cryptographic receipts to prove aggregation correctness, geometric novelty measurement to prevent incentive gaming, parallel object ownership to achieve linear scalability, and time-locked policies to check retroactive manipulation. The product of this work is a design architecture--not an actual implementation--that seeks to pass the baton in the race toward truly collaborative intelligence; an intelligence of the people, by the people, for the people.
- Asia (0.67)
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance (0.68)
- Law (0.67)
Data Value in the Age of Scaling: Understanding LLM Scaling Dynamics Under Real-Synthetic Data Mixtures
Wang, Haohui, Qi, Jingyuan, Chen, Jianpeng, Wu, Jun, Huang, Lifu, Zheng, Lecheng, Choi, Kevin, Veeramani, Balaji, Bowen, Edward, Hu, Alison, Cody, Tyler, Zhou, Dawei
The rapid progress of large language models (LLMs) is fueled by the growing reliance on datasets that blend real and synthetic data. While synthetic data offers scalability and cost-efficiency, it often introduces systematic distributional discrepancies, particularly underrepresenting long-tail knowledge due to truncation effects from data generation mechanisms like top-p sampling, temperature scaling, and finite sampling. These discrepancies pose fundamental challenges in characterizing and evaluating the utility of mixed real-synthetic datasets. In this paper, we identify a three-phase scaling behavior characterized by two breakpoints that reflect transitions in model behavior across learning head and tail knowledge. We further derive an LLM generalization bound designed for real and synthetic mixtures, revealing several key factors that govern their generalization performance. Building on our theoretical findings, we propose an effective yet efficient data valuation method that scales to large-scale datasets. Comprehensive experiments across four tasks, including image classification, sentiment classification, instruction following, and complex reasoning, demonstrate that our method surpasses state-of-the-art baselines in data valuation with significantly low computational cost.
- Europe (0.93)
- North America > United States > California (0.28)