Goto

Collaborating Authors

 Large Language Model


A Game Plan for the AI Boom

The Atlantic - Technology

Ten years ago, AlphaGo trounced human competitors--and its legacy is still present in today's most advanced bots. Thore Graepel may have been the first human to be vanquished by a superintelligence. In 2015, on his first day as a researcher at Google DeepMind, he was challenged to play against the earliest iteration of AlphaGo--a computer program developed by DeepMind that would prove so effective at the ancient-Chinese game of (or Go, as it is commonly known in the West) that it changed how humans play it, and then upended the field of AI itself. When Graepel faced it, AlphaGo was just a "baby" project, as he put it to me, and he was an accomplished amateur player. But it still took him down.


Conformal Selective Prediction with General Risk Control

arXiv.org Machine Learning

In deploying artificial intelligence (AI) models, selective prediction offers the option to abstain from making a prediction when uncertain about model quality. To fulfill its promise, it is crucial to enforce strict and precise error control over cases where the model is trusted. We propose Selective Conformal Risk control with E-values (SCoRE), a new framework for deriving such decisions for any trained model and any user-defined, bounded and continuously-valued risk. SCoRE offers two types of guarantees on the risk among ``positive'' cases in which the system opts to trust the model. Built upon conformal inference and hypothesis testing ideas, SCoRE first constructs a class of (generalized) e-values, which are non-negative random variables whose product with the unknown risk has expectation no greater than one. Such a property is ensured by data exchangeability without requiring any modeling assumptions. Passing these e-values on to hypothesis testing procedures, we yield the binary trust decisions with finite-sample error control. SCoRE avoids the need of uniform concentration, and can be readily extended to settings with distribution shifts. We evaluate the proposed methods with simulations and demonstrate their efficacy through applications to error management in drug discovery, health risk prediction, and large language models.


OpenAI Is Doing Everything … Poorly

The Atlantic - Technology

The company's sudden decision to pull the plug on Sora is a sign of deeper trouble. When I opened Sora this morning, I was met with a flood of strange and disturbing AI-generated videos. On OpenAI's video app, I scrolled through fabricated scenes of the Iran war and a barrage of fake Donald Trumps blabbering about Jeffrey Epstein. In my least favorite clip, I watched a man deep-fry an infant. The app lets users create fairly realistic-looking AI-generated clips--including of their own likeness--and then post them on a TikTok-like feed.


Demystifying Low-Rank Knowledge Distillation in Large Language Models: Convergence, Generalization, and Information-Theoretic Guarantees

arXiv.org Machine Learning

Knowledge distillation has emerged as a powerful technique for compressing large language models (LLMs) into efficient, deployable architectures while preserving their advanced capabilities. Recent advances in low-rank knowledge distillation, particularly methods like Low-Rank Clone (LRC), have demonstrated remarkable empirical success, achieving comparable performance to full-parameter distillation with significantly reduced training data and computational overhead. However, the theoretical foundations underlying these methods remain poorly understood. In this paper, we establish a rigorous theoretical framework for low-rank knowledge distillation in language models. We prove that under mild assumptions, low-rank projection preserves the optimization dynamics, yielding explicit convergence rates of $O(1/\sqrt{T})$. We derive generalization bounds that characterize the fundamental trade-off between model compression and generalization capability, showing that the generalization error scales with the rank parameter as $O(r(m+n)/\sqrt{n})$. Furthermore, we provide an information-theoretic analysis of the activation cloning mechanism, revealing its role in maximizing the mutual information between the teacher's and student's intermediate representations. Our theoretical results offer principled guidelines for rank selection, mathematically suggesting an optimal rank $r^* = O(\sqrt{n})$ where $n$ is the sample size. Experimental validation on standard language modeling benchmarks confirms our theoretical predictions, demonstrating that the empirical convergence, rank scaling, and generalization behaviors align closely with our bounds.


OpenAI shutters AI video generator Sora in abrupt announcement

The Guardian

Tech firm'says goodbye' to Sora, made publicly available in 2024, just six months after its launch of a stand-alone app In an abrupt announcement on Tuesday, OpenAI said it was "saying goodbye" to its AI video generator Sora. The move comes just six months after the company's splashy launch of a stand-alone app with which people could make and share hyper-realistic AI videos in a scrolling social feed. "To everyone who created with Sora, shared it, and built community around it: thank you," the company wrote in a post on X . "What you made with Sora mattered, and we know this news is disappointing." OpenAI first made Sora publicly available in late 2024, but it wasn't until the company launched Sora 2 and its stand-alone app last September that the video generator reached mainstream attention.


Understanding Behavior Cloning with Action Quantization

arXiv.org Machine Learning

Behavior cloning is a fundamental paradigm in machine learning, enabling policy learning from expert demonstrations across robotics, autonomous driving, and generative models. Autoregressive models like transformer have proven remarkably effective, from large language models (LLMs) to vision-language-action systems (VLAs). However, applying autoregressive models to continuous control requires discretizing actions through quantization, a practice widely adopted yet poorly understood theoretically. This paper provides theoretical foundations for this practice. We analyze how quantization error propagates along the horizon and interacts with statistical sample complexity. We show that behavior cloning with quantized actions and log-loss achieves optimal sample complexity, matching existing lower bounds, and incurs only polynomial horizon dependence on quantization error, provided the dynamics are stable and the policy satisfies a probabilistic smoothness condition. We further characterize when different quantization schemes satisfy or violate these requirements, and propose a model-based augmentation that provably improves the error bound without requiring policy smoothness. Finally, we establish fundamental limits that jointly capture the effects of quantization error and statistical complexity.


User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction

arXiv.org Machine Learning

Large language models are increasingly used as personal assistants, yet most lack a persistent user model, forcing users to repeatedly restate preferences across sessions. We propose Vector-Adapted Retrieval Scoring (VARS), a pipeline-agnostic, frozen-backbone framework that represents each user with long-term and short-term vectors in a shared preference space and uses these vectors to bias retrieval scoring over structured preference memory. The vectors are updated online from weak scalar rewards from users' feedback, enabling personalization without per-user fine-tuning. We evaluate on \textsc{MultiSessionCollab}, an online multi-session collaboration benchmark with rich user preference profiles, across math and code tasks. Under frozen backbones, the main benefit of user-aware retrieval is improved interaction efficiency rather than large gains in raw task accuracy: our full VARS agent achieves the strongest overall performance, matches a strong Reflection baseline in task success, and reduces timeout rate and user effort. The learned long-term vectors also align with cross-user preference overlap, while short-term vectors capture session-specific adaptation, supporting the interpretability of the dual-vector design. Code, model, and data are available at https://github.com/YurenHao0426/VARS.


Generalized Discrete Diffusion from Snapshots

arXiv.org Machine Learning

We introduce Generalized Discrete Diffusion from Snapshots (GDDS), a unified framework for discrete diffusion modeling that supports arbitrary noising processes over large discrete state spaces. Our formulation encompasses all existing discrete diffusion approaches, while allowing significantly greater flexibility in the choice of corruption dynamics. The forward noising process relies on uniformization and enables fast arbitrary corruption. For the reverse process, we derive a simple evidence lower bound (ELBO) based on snapshot latents, instead of the entire noising path, that allows efficient training of standard generative modeling architectures with clear probabilistic interpretation. Our experiments on large-vocabulary discrete generation tasks suggest that the proposed framework outperforms existing discrete diffusion methods in terms of training efficiency and generation quality, and beats autoregressive models for the first time at this scale. We provide the code along with a blog post on the project page : \href{https://oussamazekri.fr/gdds}{https://oussamazekri.fr/gdds}.


Meet the Gods of AI Warfare

WIRED

In its early days, the AI initiative known as Project Maven had its fair share of skeptics at the Pentagon. Today, many of them are true believers. The rise of AI warfare speaks to the biggest moral and practical question there is: Who--or what--gets to decide to take a human life? And who bears that cost? In 2018, more than 3,000 Google workers protested the company's involvement in "the business of war" after finding out the company was part of Project Maven, then a nascent Pentagon effort to use computer vision to rifle through copious video footage taken in America's overseas drone wars. They feared Project Maven's AI could one day be used for lethal targeting. In my yearslong effort to uncover the full story of Project Maven for my book,, I learned that is exactly what happened, and that the undertaking was just as controversial inside the Pentagon. Today, the tool known as Maven Smart System is being used in US operations against Iran . How the US military's top brass moved from skepticism about the use of AI in war to true believers has a lot to do with a Marine colonel named Drew Cukor. In early September 2024, during the cocktail hour at a private retreat for tech investors and defense leaders, Vice Admiral Frank "Trey" Whitworth found his way to Drew Cukor. Now Project Maven's founding leader and his skeptical successor were standing face-to-face. Three years earlier, Whitworth had been the Pentagon's top military official for intelligence, advising the chairman of the Joint Chiefs of Staff and running one of the most sensitive and potentially lethal parts of any military process: targeting.


The AI Race Is Pressuring Utilities to Squeeze More From Europe's Power Grids

WIRED

The AI Race Is Pressuring Utilities to Squeeze More From Europe's Power Grids As data center developers queue up to connect to power grids across Europe, network operators are experimenting with novel ways of clearing room for them. European countries are racing to bring new data centers online as AI labs across the globe continue to demand more compute. The primary limiting factor is energy--and specifically, the ability to move it. Though Europe is on track to generate enough energy, utilities experts say, grid operators broadly lack the infrastructure needed to transport it to where it needs to go. That's throttling grid capacity and, by extension, the number of new power-hungry data centers that can connect without risking blackouts.