Goto

Collaborating Authors

 expenditure



SequentialBayesianExperimentalDesignwith VariableCostStructure

Neural Information Processing Systems

While theoretically appealing, MIevaluation poses asignificant computational burden for most real world applications. As a result, many algorithms utilize MI bounds as proxies that lack regret-style guarantees. Here, we utilize two-sided bounds to provide such guarantees.


ParaScopes: What do Language Models Activations Encode About Future Text?

arXiv.org Artificial Intelligence

Interpretability studies in language models often investigate forward-looking representations of activations. However, as language models become capable of doing ever longer time horizon tasks, methods for understanding activations often remain limited to testing specific concepts or tokens. We develop a framework of Residual Stream Decoders as a method of probing model activations for paragraph-scale and document-scale plans. We test several methods and find information can be decoded equivalent to 5+ tokens of future context in small models. These results lay the groundwork for better monitoring of language models and better understanding how they might encode longer-term planning information.


Generative AI and Firm Productivity: Field Experiments in Online Retail

arXiv.org Artificial Intelligence

We quantify the impact of Generative Artificial Intelligence (GenAI) on firm productivity through a series of large-scale randomized field experiments involving millions of users and products at a leading cross-border online retail platform. Over six months in 2023-2024, GenAI-based enhancements were integrated into seven consumer-facing business workflows. We find that GenAI adoption significantly increases sales, with treatment effects ranging from $0\%$ to $16.3\%$, depending on GenAI's marginal contribution relative to existing firm practices. Because inputs and prices were held constant across experimental arms, these gains map directly into total factor productivity improvements. Across the four GenAI applications with positive effects, the implied annual incremental value is approximately $\$ 5$ per consumer-an economically meaningful impact given the retailer's scale and the early stage of GenAI adoption. The primary mechanism operates through higher conversion rates, consistent with GenAI reducing frictions in the marketplace and improving consumer experience. We also document substantial heterogeneity: smaller and newer sellers, as well as less experienced consumers, exhibit disproportionately larger gains. Our findings provide novel, large-scale causal evidence on the productivity effects of GenAI in online retail, highlighting both its immediate value and broader potential.


The Feasibility of Training Sovereign Language Models in the Global South: A Study of Brazil and Mexico

arXiv.org Artificial Intelligence

The rapid escalation of computational requirements for training large-scale language models has reinforced structural asymmetries between high-capacity jurisdictions and countries in the Global South. This paper examines the technical and fiscal feasibility of sovereign-scale language model training in Brazil and Mexico under conditions of constrained hardware access, energy availability, and fiscal ceilings. Using a dual-axis design that varies accelerator generation (NVIDIA H100 vs. A100) and training duration (90 vs. 150 days), we estimate compute demand, energy consumption, capital expenditures, and regulatory compatibility for the training of a 10-trillion-token model. Our findings show that while all configurations remain below export-control and electrical infrastructure thresholds, fiscal viability is determined by hardware efficiency. H100-based scenarios achieve training feasibility at a total cost of 8-14 million USD, while A100 deployments require 19-32 million USD due to higher energy and hardware demand. We argue that extending training timelines should be treated as a policy lever to mitigate hardware constraints, enabling the production of usable, auditable, and locally aligned models without competing at the global frontier. This study contributes to the discourse on AI compute governance and technological sovereignty by highlighting context-sensitive strategies that allow middle-income countries to establish sustainable and strategically sufficient AI capabilities.


Intelligent Healthcare Ecosystems: Optimizing the Iron Triangle of Healthcare (Access, Cost, Quality)

arXiv.org Artificial Intelligence

Abstract--The United States spends more on healthcare than any other nation - nearly 17% of GDP as of the early 2020s - yet struggles with uneven access and outcomes [1] [2]. This paradox of high cost, variable quality, and inequitable access is often described by the "Iron Triangle" of healthcare [3], which posits that improvements in one dimension (access, cost, or quality) often come at the expense of the others. This paper explores how an Intelligent Healthcare Ecosystem (iHE) - an integrated system leveraging advanced technologies and data-driven innovation - can "bend" or even break this iron triangle, enabling simultaneous enhancements in access, cost-efficiency, and quality of care. We review historical and current trends in U.S. healthcare spending, including persistent waste and international comparisons, to underscore the need for transformative change. We then propose a conceptual model and strategic framework for iHE, incorporating emerging technologies such as generative AI and large language models (LLMs), federated learning, interoperability standards (FHIR) and nationwide networks (TEFCA), and digital twins. We introduce an updated healthcare value equation that integrates all three corners of the iron triangle, and we hypothesize that an intelligently coordinated ecosystem can maximize this value by delivering high-quality care to more people at lower cost. Methods include a narrative synthesis of recent literature and policy reports, and Results highlight key components and enabling technologies of an iHE. We discuss how such ecosystems can reduce waste, personalize care, enhance interoperability, and support value-based models, all while addressing challenges like privacy, bias, and stakeholder adoption. The paper is formatted per MDPI guidelines, with APA-style numbered references, illustrative figures (U.S. spending trends, waste breakdown, international spending comparison, conceptual models), equations, and a structured layout. Our findings suggest that embracing an Intelligent Healthcare Ecosystem is pivotal for optimizing the long-standing trade-offs in healthcare's iron triangle, moving towards a system that is more accessible, affordable, and of higher quality for all.


Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing

Neural Information Processing Systems

Crowdsourcing has gained immense popularity in machine learning applications for obtaining large amounts of labeled data. Crowdsourcing is cheap and fast, but suffers from the problem of low-quality data. To address this fundamental challenge in crowdsourcing, we propose a simple payment mechanism to incentivize workers to answer only the questions that they are sure of and skip the rest. We show that surprisingly, under a mild and natural "no-free-lunch" requirement, this mechanism is the one and only incentive-compatible payment mechanism possible. We also show that among all possible incentive-compatible mechanisms (that may or may not satisfy no-free-lunch), our mechanism makes the smallest possible payment to spammers. Interestingly, this unique mechanism takes a "multiplicative" form. The simplicity of the mechanism is an added benefit. In preliminary experiments involving over several hundred workers, we observe a significant reduction in the error rates under our unique mechanism for the same or lower monetary expenditure.


Big tech has spent 155bn on AI this year. It's about to spend hundreds of billions more

The Guardian

The US's largest companies have spent 2025 locked in a competition to spend more money than one another, lavishing 155bn on the development of artificial intelligence, more than the US government has spent on education, training, employment and social services in the 2025 fiscal year so far. Based on the most recent financial disclosures of Silicon Valley's biggest players, the race is about to accelerate to hundreds of billions in a single year. Over the past two weeks, Meta, Microsoft, Amazon, and Alphabet, Google's parent, have shared their quarterly public financial reports. Each disclosed that their year-to-date capital expenditure, a figure that refers to the money companies spend to acquire or upgrade tangible assets, already totals tens of billions. Capex, as the term is abbreviated, is a proxy for technology companies' spending on AI because the technology requires gargantuan investments in physical infrastructure, namely data centers, which require large amounts of power, water and expensive semiconductor chips.


Zuckerberg claims 'superintelligence is now in sight' as Meta lavishes billions on AI

The Guardian

Whether it's poaching top talent away from competitors, acquiring AI startups or proclaiming that it will build data centers the size of Manhattan, Meta has been on a spending spree to boost its artificial intelligence capabilities for months now. The massive splurge is paying off, according to Meta's chief executive. In a new memo posted on Wednesday ahead of the company's quarterly earnings report, Mark Zuckerberg, describes his ambitions for developing what he calls "superintelligence". "Over the last few months we have begun to see glimpses of our AI systems improving themselves," Zuckerberg wrote. "The improvement is slow for now, but undeniable. Developing superintelligence is now in sight."


Chart Question Answering from Real-World Analytical Narratives

arXiv.org Artificial Intelligence

We present a new dataset for chart question answering (CQA) constructed from visualization notebooks. The dataset features real-world, multi-view charts paired with natural language questions grounded in analytical narratives. Unlike prior benchmarks, our data reflects ecologically valid reasoning workflows. Benchmarking state-of-the-art multimodal large language models reveals a significant performance gap, with GPT-4.1 achieving an accuracy of 69.3%, underscoring the challenges posed by this more authentic CQA setting.