Goto

Collaborating Authors

 bubeck





e3251075554389fe91d17a794861d47b-Paper.pdf

Neural Information Processing Systems

This perspectiveparallels an earlier phenomenon inthe much better understood field of optimization where convexity has played a preponderant role for both theoretical and methodological advances [Nes04; Bub15].


TightFirst-andSecond-OrderRegretBounds forAdversarialLinearBandits

Neural Information Processing Systems

In addition, we need only assumptions weaker than those of existing algorithms; our algorithms work on discrete action sets as well as continuous ones without apriori knowledge about losses, and theyrun efficiently ifalinear optimization oracle for the action set is available.


3 Common Misunderstandings About AI in 2025

TIME - Tech

Children and parked cars are color-coded on a monitor inside a Mercedes-Benz S-Class during an autonomous driving and AI demonstration in Immendingen, Germany on July 17, 2018. Children and parked cars are color-coded on a monitor inside a Mercedes-Benz S-Class during an autonomous driving and AI demonstration in Immendingen, Germany on July 17, 2018. In 2025, misconceptions about AI flourished as people struggled to make sense of the rapid development and adoption of the technology. Here are three popular ones to leave behind in the New Year. When GPT-5 was released in May, people wondered (not for the first time) if AI was hitting a wall.


A Universal Law of Robustness via Isoperimetry

Neural Information Processing Systems

Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied. A puzzling phenomenon in the current practice of deep learning is that models are trained with many more parameters than what this classical theory would suggest. We propose a theoretical explanation for this phenomenon. We prove that for a broad class of data distributions and model classes, overparametrization is {\em necessary} if one wants to interpolate the data {\em smoothly}. Namely we show that {\em smooth} interpolation requires $d$ times more parameters than mere interpolation, where $d$ is the ambient data dimension. We prove this universal law of robustness for any smoothly parametrized function class with polynomial size weights, and any covariate distribution verifying isoperimetry. In the case of two-layers neural networks and Gaussian covariates, this law was conjectured in prior work by Bubeck, Li and Nagaraj. We also give an interpretation of our result as an improved generalization bound for model classes consisting of smooth functions.


How social media encourages the worst of AI boosterism

MIT Technology Review

The era of hype first, think later. Demis Hassabis, CEO of Google DeepMind, summed it up in three words: "This is embarrassing." Hassabis was replying on X to an overexcited post by Sébastien Bubeck, a research scientist at the rival firm OpenAI, announcing that two mathematicians had used OpenAI's latest large language model, GPT-5, to find solutions to 10 unsolved problems in mathematics. "Science acceleration via AI has officially begun," Bubeck crowed. Put your math hats on for a minute, and let's take a look at what this beef from mid-October was about. Bubeck was excited that GPT-5 seemed to have somehow solved a number of puzzles known as Erdős problems.


A Universal Law of Robustness via Isoperimetry

Neural Information Processing Systems

Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied. A puzzling phenomenon in the current practice of deep learning is that models are trained with many more parameters than what this classical theory would suggest. We propose a theoretical explanation for this phenomenon. We prove that for a broad class of data distributions and model classes, overparametrization is {\em necessary} if one wants to interpolate the data {\em smoothly}. Namely we show that {\em smooth} interpolation requires d times more parameters than mere interpolation, where d is the ambient data dimension.


Small Language Models for Application Interactions: A Case Study

Li, Beibin, Zhang, Yi, Bubeck, Sébastien, Pathuri, Jeevan, Menache, Ishai

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are becoming pervasive in assisting humans with a wide variety of tasks, such as writing documents, presenting work, coding and health assistant. Generative LLMs are being rapidly integrated in user-facing software, for answering questions and increasing productivity through simple, language based interactions with technology. One of the key operating principles behind LLMs is exploiting their ability to generalize to unseen tasks by providing examples through the prompt itself - an approach commonly known as in-context learning. While LLMs are being designed to support larger prompt sizes, processing very large prompts might be expensive and incur non-negligible latencies. In this paper, we consider the alternative of using Small Language Models (SLMs), which are being developed nowadays and open-sourced by several companies.