contributor
ASustainable AIEconomy Needs Data Deals That Work for Generators
We argue that the machine learning value chain is structurally unsustainable due to an economic data processing inequality: each state in the data cycle from inputs to model weights to synthetic outputs refines technical signal but strips economic equity from data generators. We show, by analyzing seventy-three public data deals, that the majority of value accrues to aggregators, with documented creator royalties rounding to zero and widespread opacity of deal terms. This is not just an economic welfare concern: as data and its derivatives become economic assets, the feedback loop that sustains current learning algorithms is at risk. We identify three structural faults--missing provenance, asymmetric bargaining power, and nondynamic pricing--as the operational machinery of this inequality. In our analysis, we trace these problems along the machine learning value chain and propose an Equitable Data-Value Exchange (EDVEX) Framework to enable a minimal market that benefits all participants. Finally, we outline research directions where our community can make concrete contributions to data deals and contextualize our position with related and orthogonal viewpoints.
693e00827fd44bdfca210801fe1e6439-Paper-Position_Paper_Track.pdf
The meteoric rise of Artificial Intelligence (AI), with its rapidly expanding market capitalization, presents both transformative opportunities and critical challenges. Chief among these is the urgent need for a new, unified paradigm for trustworthy evaluation, as current benchmarks increasingly reveal critical vulnerabilities. Issues like data contamination and selective reporting by model developers fuel hype, while inadequate data quality control can lead to biased evaluations that, even if unintentionally, may favor specific approaches. As a flood of participants enters the AI space, this "Wild West" of assessment makes distinguishing genuine progress from exaggerated claims exceptionally difficult. Such ambiguity blurs scientific signals and erodes public confidence, much as unchecked claims would destabilize financial markets reliant on credible oversight from agencies like Moody's. In high-stakes human examinations (e.g., SAT, GRE), substantial effort is devoted to ensuring fairness and credibility; why settle for less in evaluating AI, especially given its profound societal impact? This position paper argues that a laissezfaire approach is untenable. For true and sustainable AI advancement, we call for a paradigm shift to a unified, live, and quality-controlled benchmarking framework--robust by construction rather than reliant on courtesy or goodwill.
Flood of AI 'garbage' is pushing open-source developers to the limit
Flood of AI'garbage' is pushing open-source developers to the limit A viral cartoon about open-source software shows a teetering pile of boxes labelled "all modern digital infrastructure" and one tiny box right at the bottom, propping up the whole lot: "a project some random person in Nebraska has been thanklessly maintaining since 2003". That's the reality of open source: every website, application and operating system relies on it. Modern society couldn't function without it, and yet it's written by volunteers in their spare time. But the growing burden caused by a flood of AI-generated code is causing many to burn out and leave the community altogether, threatening the future of open-source software. 'Flashes of brilliance and frustration': I let an AI agent run my day AI models are making it easier and easier to generate code to build new features, fix bugs or create entire new projects at the click of a button.
SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes
Most of the medical observational studies estimate the causal treatment effects using electronic health records (EHR), where a patient's covariates and outcomes are both observed longitudinally. However, previous methods focus only on adjusting for the covariates while neglecting the temporal structure in the outcomes. To bridge the gap, this paper develops a new method, SyncTwin, that learns a patient-specific time-constant representation from the pre-treatment observations. SyncTwin issues counterfactual prediction of a target patient by constructing a synthetic twin that closely matches the target in representation. The reliability of the estimated treatment effect can be assessed by comparing the observed and synthetic pre-treatment outcomes. The medical experts can interpret the estimate by examining the most important contributing individuals to the synthetic twin. In the real-data experiment, SyncTwin successfully reproduced the findings of a randomized controlled clinical trial using observational data, which demonstrates its usability in the complex real-world EHR.
Auditing the Auditors: Does Community-based Moderation Get It Right?
Alimohammadi, Yeganeh, Huang, Karissa, Borgs, Christian, Chayes, Jennifer
Online social platforms increasingly rely on crowd-sourced systems to label misleading content at scale, but these systems must both aggregate users' evaluations and decide whose evaluations to trust. To address the latter, many platforms audit users by rewarding agreement with the final aggregate outcome, a design we term consensus-based auditing. We analyze the consequences of this design in X's Community Notes, which in September 2022 adopted consensus-based auditing that ties users' eligibility for participation to agreement with the eventual platform outcome. We find evidence of strategic conformity: minority contributors' evaluations drift toward the majority and their participation share falls on controversial topics, where independent signals matter most. We formalize this mechanism in a behavioral model in which contributors trade off private beliefs against anticipated penalties for disagreement. Motivated by these findings, we propose a two-stage auditing and aggregation algorithm that weights contributors by the stability of their past residuals rather than by agreement with the majority. The method first accounts for differences across content and contributors, and then measures how predictable each contributor's evaluations are relative to the latent-factor model. Contributors whose evaluations are consistently informative receive greater influence in aggregation, even when they disagree with the prevailing consensus. In the Community Notes data, this approach improves out-of-sample predictive performance while avoiding penalization of disagreement.
Wikipedia's Existential Threats Feel Greater Than Ever
As the free online encyclopedia turns 25, it's facing political opposition, AI scraping, dwindling volunteers, and a public that may no longer believe in its ideals. In 2010, the FBI sent Wikipedia a letter that would be intimidating for any organization to receive. The missive demanded that the free online encyclopedia remove the FBI's logo from an entry about the agency, claiming that reproducing the emblem was illegal and punishable with fines, imprisonment, "or both." Rather than back down, a lawyer for the Wikimedia Foundation, which hosts Wikipedia, shot back a sharp refusal outlining how the FBI's interpretation of the relevant statute was incorrect and saying that Wikipedia was "prepared to argue our view in court." It worked--the FBI dropped the matter.
Contributor: Rob Reiner reshaped how California understands and invests in children
Things to Do in L.A. Hollywood director Rob Reiner engineered Proposition 10, a 1998 tobacco tax that created First 5 California, generating more than $11 billion for early childhood programs statewide. This is read by an automated voice. Please report any issues or inconsistencies here . After his tragic death Sunday, the world remembers Rob Reiner as a cinematic force -- and he was one, as an unforgettable presence on the ambitious 1970s sitcom "All in the Family" and later as the director of beloved films. I came to know him differently: as a restless thinker who transformed his own life story into bold public policy, reshaping how California understands and invests in its youngest children.
Eka-Eval: An Evaluation Framework for Low-Resource Multilingual Large Language Models
Sinha, Samridhi Raj, Sheth, Rajvee, Upperwal, Abhishek, Singh, Mayank
The rapid evolution of Large Language Models' has underscored the need for evaluation frameworks that are globally applicable, flexible, and modular, and that support a wide range of tasks, model types, and linguistic settings. We introduce EKA-EVAL, a unified, end- to-end framework that combines a zero-code web interface and an interactive CLI to ensure broad accessibility. It integrates 50+ multilingual benchmarks across nine evaluation categories, supports local and proprietary models, and provides 11 core capabilities through a modular, plug-and-play architecture. Designed for scalable, multilingual evaluation with support for low-resource multilingual languages, EKA-EVAL is, to the best of our knowledge, the first suite to offer comprehensive coverage in a single platform. Comparisons against five existing baselines indicate improvements of at least 2x better on key usability measures, with the highest user satisfaction, faster setup times, and consistent benchmark reproducibility. The framework is open-source and publicly available at https://github.com/lingo-iitgn/eka-eval.