Goto

Collaborating Authors

 ville


CITE: Anytime-Valid Statistical Inference in LLM Self-Consistency

arXiv.org Machine Learning

Large language models often improve reasoning by sampling multiple outputs and aggregating their final answers, but precise and efficient control of error levels remains a challenging task. In particular, deciding when to stop sampling remains difficult when the stopping rule is data-dependent and the set of possible response labels is not known in advance. We study anytime-valid certification of a prespecified target answer as the unique mode of the model's response distribution, a guarantee distinct from answer correctness. We propose the Certification by Intersection-union Testing with Eprocesses (CITE) algorithm, which provably controls false certification at any prescribed level under arbitrary data-driven stopping, without requiring prior knowledge of the answer category set. We also prove a category-set-size-free stopping-time rate, establish matching minimax lower bounds up to constants in the main regime, and extend the construction to confidence-weighted voting. Simulations and LLM self-consistency experiments show empirical error control and improved certification in diffuse-tail settings.


Eventually LIL Regret: Almost Sure $\ln\ln T$ Regret for a sub-Gaussian Mixture on Unbounded Data

arXiv.org Machine Learning

We prove that a classic sub-Gaussian mixture proposed by Robbins in a stochastic setting actually satisfies a path-wise (deterministic) regret bound. For every path in a natural ``Ville event'' $E_α$, this regret till time $T$ is bounded by $\ln^2(1/α)/V_T + \ln (1/α) + \ln \ln V_T$ up to universal constants, where $V_T$ is a nonnegative, nondecreasing, cumulative variance process. (The bound reduces to $\ln(1/α) + \ln \ln V_T$ if $V_T \geq \ln(1/α)$.) If the data were stochastic, then one can show that $E_α$ has probability at least $1-α$ under a wide class of distributions (eg: sub-Gaussian, symmetric, variance-bounded, etc.). In fact, we show that on the Ville event $E_0$ of probability one, the regret on every path in $E_0$ is eventually bounded by $\ln \ln V_T$ (up to constants). We explain how this work helps bridge the world of adversarial online learning (which usually deals with regret bounds for bounded data), with game-theoretic statistics (which can handle unbounded data, albeit using stochastic assumptions). In short, conditional regret bounds serve as a bridge between stochastic and adversarial betting.


Linking Named Entities in Diderot's \textit{Encyclop\'edie} to Wikidata

arXiv.org Artificial Intelligence

Diderot's \textit{Encyclop\'edie} is a reference work from XVIIIth century in Europe that aimed at collecting the knowledge of its era. \textit{Wikipedia} has the same ambition with a much greater scope. However, the lack of digital connection between the two encyclopedias may hinder their comparison and the study of how knowledge has evolved. A key element of \textit{Wikipedia} is Wikidata that backs the articles with a graph of structured data. In this paper, we describe the annotation of more than 10,300 of the \textit{Encyclop\'edie} entries with Wikidata identifiers enabling us to connect these entries to the graph. We considered geographic and human entities. The \textit{Encyclop\'edie} does not contain biographic entries as they mostly appear as subentries of locations. We extracted all the geographic entries and we completely annotated all the entries containing a description of human entities. This represents more than 2,600 links referring to locations or human entities. In addition, we annotated more than 9,500 entries having a geographic content only. We describe the annotation process as well as application examples. This resource is available at https://github.com/pnugues/encyclopedie_1751


Better-than-KL PAC-Bayes Bounds

arXiv.org Machine Learning

Let $f(\theta, X_1),$ $ \dots,$ $ f(\theta, X_n)$ be a sequence of random elements, where $f$ is a fixed scalar function, $X_1, \dots, X_n$ are independent random variables (data), and $\theta$ is a random parameter distributed according to some data-dependent posterior distribution $P_n$. In this paper, we consider the problem of proving concentration inequalities to estimate the mean of the sequence. An example of such a problem is the estimation of the generalization error of some predictor trained by a stochastic algorithm, such as a neural network where $f$ is a loss function. Classically, this problem is approached through a PAC-Bayes analysis where, in addition to the posterior, we choose a prior distribution which captures our belief about the inductive bias of the learning problem. Then, the key quantity in PAC-Bayes concentration bounds is a divergence that captures the complexity of the learning problem where the de facto standard choice is the KL divergence. However, the tightness of this choice has rarely been questioned. In this paper, we challenge the tightness of the KL-divergence-based bounds by showing that it is possible to achieve a strictly tighter bound. In particular, we demonstrate new high-probability PAC-Bayes bounds with a novel and better-than-KL divergence that is inspired by Zhang et al. (2022). Our proof is inspired by recent advances in regret analysis of gambling algorithms, and its use to derive concentration inequalities. Our result is first-of-its-kind in that existing PAC-Bayes bounds with non-KL divergences are not known to be strictly better than KL. Thus, we believe our work marks the first step towards identifying optimal rates of PAC-Bayes bounds.


Matrix Supermartingales and Randomized Matrix Concentration Inequalities

arXiv.org Machine Learning

These inequalities are often randomized in a way that renders them strictly tighter than existing deterministic results in the literature, are typically expressed in the Loewner order, and are sometimes valid at arbitrary data-dependent stopping times. Along the way, we explore the theory of matrix supermartingales and maximal inequalities, potentially of independent interest.


Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance

arXiv.org Machine Learning

In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an ``e-process'' (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious dependence on the error probability $\alpha$. Numerical experiments are provided along the way to compare and contrast the various approaches.


The extended Ville's inequality for nonintegrable nonnegative supermartingales

arXiv.org Machine Learning

Following initial work by Robbins, we rigorously present an extended theory of nonnegative supermartingales, requiring neither integrability nor finiteness. In particular, we derive a key maximal inequality foreshadowed by Robbins, which we call the extended Ville's inequality, that strengthens the classical Ville's inequality (for integrable nonnegative supermartingales), and also applies to our nonintegrable setting. We derive an extension of the method of mixtures, which applies to $\sigma$-finite mixtures of our extended nonnegative supermartingales. We present some implications of our theory for sequential statistics, such as the use of improper mixtures (priors) in deriving nonparametric confidence sequences and (extended) e-processes.


China Miéville and the Politics of Surrealism

The New Yorker

China Miéville has long had spiders on the brain. In his breakthrough novel, 2000's "Perdido Street Station," a mysterious, spiderlike being called the Weaver assists a scientist named Isaac who's trying to save the fantastical city of Bas-Lag from a catastrophic infestation. In Miéville's new novella, "The Last Days of New Paris," the streets of Paris in 1950 have gone haywire after the detonation of a reality-altering bomb that brings various Surrealist works to frightening life, including an arachnoid manifestation reminiscent of Odilon Redon's painting "The Smiling Spider." "There's something about the arachnid," Miéville told me recently, on the phone from his home in London. Bataille writes about the spider as an avatar of formlessness, this very, very powerful thing.


A betting interpretation for probabilities and Dempster-Shafer degrees of belief

arXiv.org Artificial Intelligence

There are at least two ways to interpret numerical degrees of belief in terms of betting: (1) you can offer to bet at the odds defined by the degrees of belief, or (2) you can judge that a strategy for taking advantage of such betting offers will not multiply the capital it risks by a large factor. Both interpretations can be applied to ordinary additive probabilities and used to justify updating by conditioning. Only the second can be applied to Dempster-Shafer degrees of belief and used to justify Dempster's rule of combination.