Goto

Collaborating Authors

 lottery



Revenue maximization via machine learning with noisy data

Neural Information Processing Systems

Increasingly, copious amounts of consumer data are used to learn high-revenue mechanisms via machine learning. Existing research on mechanism design via machine learning assumes that there is a distribution over the buyers' values for


TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training

Menezes, Michael, Su, Barbara, Feng, Xinze, Farhat, Yehya, Shili, Hamza, Kyrillidis, Anastasios

arXiv.org Artificial Intelligence

We introduce TwIST, a distributed training framework for efficient large language model (LLM) sparsification. TwIST trains multiple subnetworks in parallel, periodically aggregates their parameters, and resamples new subnetworks during training. This process identifies high-quality subnetworks ("golden tickets") without requiring post-training procedures such as calibration or Hessian-based recovery. As a result, TwIST enables zero-cost pruning at deployment time while achieving perplexity competitive with state-of-the-art post-training sparsification methods. The benefits are most pronounced under aggressive sparsity (e.g., 50%+), where TwIST significantly outperforms baseline methods; for example, reaching 23.14 PPL compared to 31.64 for the closest prior approach. Unlike unstructured pruning, TwIST produces structured, dense matrices that offer practical inference speedups and memory reductions on commodity hardware (e.g., CPUs) that do not support efficient sparse computation. TwIST provides an efficient training-time path to deployable sparse LLMs without additional fine-tuning or recovery overhead.



CompLLM: Compression for Long Context Q&A

Berton, Gabriele, Unnikrishnan, Jayakrishnan, Tran, Son, Shah, Mubarak

arXiv.org Artificial Intelligence

While soft context compression methods, which map input text to smaller latent representations, have shown promise, their real-world adoption is limited. Existing techniques typically compress the context as a single unit, which leads to quadratic compression complexity and an inability to reuse computations across queries with overlapping contexts. In this work, we introduce CompLLM, a soft compression technique designed for practical deployment. Instead of processing the context holistically, CompLLM divides it into segments and compresses each one independently. This simple design choice yields three critical properties: efficiency, as the compression step scales linearly with the context length; scalability, enabling models trained on short sequences (e.g., 1k tokens) to generalize to contexts of 100k tokens; and reusability, allowing compressed segments to be cached and reused across different queries. Our experiments show that with a 2x compression rate, at high context lengths CompLLM speeds up Time To First Token (TTFT) by up to 4x and reduces the KV cache size by 50%. Furthermore, CompLLM achieves performance comparable to that obtained with the uncompressed context, and even surpasses it on very long sequences, demonstrating its effectiveness and practical utility. LOFT is a long context benchmark (128k tokens) designed to stress-test the long context capabilities of frontiers LLMs as Gemini 1.5 Pro, GPT -4o, and Claude 3 Opus. With CompLLM we show that we can improve long context capabilities of much smaller open source LLMs. Figure 1: At high context lengths, CompLLM leads to considerable speedup and improved results, without requiring any modification or tuning of the LLM, by efficiently reducing the number of embeddings fed to the LLM. The plot shows the Time To First Token (TTFT) with CompLLM and without it (i.e. with a standard pipeline) as a function of context length. Among the many use cases of LLMs, one of the most popular is long context Q&A: given a textual context of arbitrary length, the LLM should answer questions about it. Applications include coding assistants reading large codebases (Team, 2024), web agents reasoning on HTML pages (Zeng et al., 2024), users querying an LLM about a set of documents (Liu et al., 2024a), or RAG systems 1 Due to the quadratic complexity of the transformer (V aswani et al., 2017), processing long contexts can be unfeasibly expensive: it is therefore important to reduce computational complexity, especially as contexts grows longer and longer.



Principled Foundations for Preference Optimization

Zhou, Wenxuan, Zhang, Shujian, Magdalou, Brice, Lambert, John, Amid, Ehsan, Nock, Richard, Hard, Andrew

arXiv.org Artificial Intelligence

The connection is established for all of Savage's DPO framework to generalize its functional parts (Alfano et al., 2025; Azar et al., 2024; Chen et al., The latter involves elements from Doignon-Falmagne's stochastic choice These many design elements lead to a generalization making the most of the connection since we encompass all of properness on Savage's side (regardless of optional properties like symmetry, We also encompass all of the modelling's power on Krantz, Luce, Suppes and Notably, our level of generalization is able to support "for free" important This is an important task because DPO was designed with the objective to simplify RLHF and getting "above" DPO is mandatory to improve results by getting more freedom on reward shapes, trajectories and preference behaviours (Gupta et al., 2025), all of which needs to be done while One perhaps unexpected pitfall comes from the RLHF/DPO inherited "gold To preserve readability, all proofs are given in an appendix. We adopt many definitions from Rafailov et al. (2023).


These centuries-old equations predict flowing fluid – until they don't

New Scientist

The following is an extract from our Lost in Space-Time newsletter. Each month, we hand over the keyboard to a physicist or mathematician to tell you about fascinating ideas from their corner of the universe. You can sign up for Lost in Space-Time here. The Navier-Stokes equations have been used to model the flow of fluids for almost 200 years – but we still don't really understand them. This can often feel a little odd, especially as we rely on these equations every day to help build rockets, design drugs and understand climate change. But here is where you have to think like a mathematician.


Artificial Finance: How AI Thinks About Money

Erdem, Orhan, Ashok, Ragavi Pobbathi

arXiv.org Artificial Intelligence

In this paper, we explore how large language models (LLMs) approach financial decision - making by systematically comparing their responses to those of human participants across the globe. We posed a set of commonly used financial decision - making questions t o seven leading LLMs, including five models from the GPT series (GPT - 4o, GPT - 4.5, o1, o3 - mini), Gemini 2.0 Flash, and DeepSeek R1 . We then compared their outputs to human responses drawn from a dataset covering 53 nations. Our analysis reveals three main r esults. First, LLMs generally exhibit a risk - neutral decision - making pattern, favoring choices aligned with expected value calculations when faced with lottery - type questions . Second, when evaluating trade - offs between present and future, LLMs occasionally produce responses that appear inconsistent with normative reasoning . Third, when we examine cross - national similarities, we f ind that the LLMs' aggregate responses most closely resemble those of participants from Tanzania. These findings contribute to the understanding of how LLMs emulate human - like decision behaviors and highlight potential cultural and training influences embedded within their outputs.


Will artificial agents pursue power by default?

Tarsney, Christian

arXiv.org Artificial Intelligence

Researchers worried about catastrophic risks from advanced AI have argued that we should expect sufficiently capable AI agents to pursue power over humanity because power is a convergent instrumental goal, something that is useful for a wide range of final goals. Others have recently expressed skepticism of these claims. This paper aims to formalize the concepts of instrumental convergence and power-seeking in an abstract, decision-theoretic framework, and to assess the claim that power is a convergent instrumental goal. I conclude that this claim contains at least an element of truth, but might turn out to have limited predictive utility, since an agent's options cannot always be ranked in terms of power in the absence of substantive information about the agent's final goals. However, the fact of instrumental convergence is more predictive for agents who have a good shot at attaining absolute or near-absolute power.