Goto

Collaborating Authors

 hypothesis


Why playing is no laughing matter for otters

Popular Science

Play behavior is not all'marshmallow science,' and more play can equal better health. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Breakthroughs, discoveries, and DIY tips sent six days a week. From by Heide Island, PhD, to be published on 4/28/26 by Avery, an imprint of Penguin Publishing Group, a division of Penguin Random House, LLC. From behind a stand of frozen lupine, Patches, Crest, and Slash emerge onto the wetland. Moonshine reflects off the newly fallen snow, illuminating the predawn hour with a supernatural brightness. They halt beside a corrugated metal culvert, side by side, until Patches lurches forward and leaps onto the bank of Admirals Lake. Her landing fractures the frozen lakeshore, stamping an otter-sized divot. The two girls follow behind her, each landing with a loud crunch, leaving star-shaped bull's-eyes in the ice. The otters are out early, exploiting the cold; an icy lake makes for sluggish fish.


Causality-Encoded Diffusion Models for Interventional Sampling and Edge Inference

Chen, Li, Shen, Xiaotong, Pan, Wei

arXiv.org Machine Learning

Diffusion models [1, 2, 3] have emerged as a powerful class of generative models, achieving state-of-the-art performance across a wide range of applications, including imaging [2] and scientific-data synthesis [4]. From a statistical perspective, they can be viewed as flexible nonparametric estimators of a (conditional) distribution via score estimation and reverse-time stochastic differential equations (SDEs) [5, 6]. Despite this expressive power, standard diffusion models are typically causality-agnostic: they learn a joint law without encoding the directional asymmetries required for causal interpretation. As a consequence, they do not, on their own, provide principled answers to interventional queries or support broader causal analyses, which are central to structural causal models (SCMs) [7]. When a causal ordering (or a directed acyclic graph) is available, it is natural to construct generative procedures that sample variables sequentially according to the causal factorisation. Such iterative, ordering-respecting approaches have been proposed using a variety of generative models, including generative adversarial networks [8], variational autoencoders [9], normalising flows [10], and diffusion-based constructions such as DDIM [11]. However, a rigorous statistical understandingof the advantages of exploitingsuch causalstructureand the inferential use of the resulting generator remain less developed.


Adaptive multi-fidelity optimization with fast learning rates

Fiegel, Come, Gabillon, Victor, Valko, Michal

arXiv.org Machine Learning

In multi-fidelity optimization, biased approximations of varying costs of the target function are available. This paper studies the problem of optimizing a locally smooth function with a limited budget, where the learner has to make a tradeoff between the cost and the bias of these approximations. We first prove lower bounds for the simple regret under different assumptions on the fidelities, based on a cost-to-bias function. We then present the Kometo algorithm which achieves, with additional logarithmic factors, the same rates without any knowledge of the function smoothness and fidelity assumptions, and improves previously proven guarantees. We finally empirically show that our algorithm outperforms previous multi-fidelity optimization methods without the knowledge of problem-dependent parameters.


Universality of Gaussian-Mixture Reverse Kernels in Conditional Diffusion

Ishtiaque, Nafiz, Haque, Syed Arefinul, Alam, Kazi Ashraful, Jahara, Fatima

arXiv.org Machine Learning

We prove that conditional diffusion models whose reverse kernels are finite Gaussian mixtures with ReLU-network logits can approximate suitably regular target distributions arbitrarily well in context-averaged conditional KL divergence, up to an irreducible terminal mismatch that typically vanishes with increasing diffusion horizon. A path-space decomposition reduces the output error to this mismatch plus per-step reverse-kernel errors; assuming each reverse kernel factors through a finite-dimensional feature map, each step becomes a static conditional density approximation problem, solved by composing Norets' Gaussian-mixture theory with quantitative ReLU bounds. Under exact terminal matching the resulting neural reverse-kernel class is dense in conditional KL.


An Optimal Sauer Lemma Over $k$-ary Alphabets

Hanneke, Steve, Meng, Qinglin, Moran, Shay, Shaeiri, Amirreza

arXiv.org Machine Learning

The Sauer-Shelah-Perles Lemma is a cornerstone of combinatorics and learning theory, bounding the size of a binary hypothesis class in terms of its Vapnik-Chervonenkis (VC) dimension. For classes of functions over a $k$-ary alphabet, namely the multiclass setting, the Natarajan dimension has long served as an analogue of VC dimension, yet the corresponding Sauer-type bounds are suboptimal for alphabet sizes $k>2$. In this work, we establish a sharp Sauer inequality for multiclass and list prediction. Our bound is expressed in terms of the Daniely--Shalev-Shwartz (DS) dimension, and more generally with its extension, the list-DS dimension -- the combinatorial parameters that characterize multiclass and list PAC learnability. Our bound is tight for every alphabet size $k$, list size $\ell$, and dimension value, replacing the exponential dependence on $\ell$ in the Natarajan-based bound by the optimal polynomial dependence, and improving the dependence on $k$ as well. Our proof uses the polynomial method. In contrast to the classical VC case, where several direct combinatorial proofs are known, we are not aware of any purely combinatorial proof in the DS setting. This motivates several directions for future research, which are discussed in the paper. As consequences, we obtain improved sample complexity upper bounds for list PAC learning and for uniform convergence of list predictors, sharpening the recent results of Charikar et al.~(STOC~2023), Hanneke et al.~(COLT~2024), and Brukhim et al.~(NeurIPS~2024).


Lipschitz bounds for integral kernels

Reverdi, Justin, Zhang, Sixin, Gamboa, Fabrice, Gratton, Serge

arXiv.org Machine Learning

Feature maps associated with positive definite kernels play a central role in kernel methods and learning theory, where regularity properties such as Lipschitz continuity are closely related to robustness and stability guarantees. Despite their importance, explicit characterizations of the Lipschitz constant of kernel feature maps are available only in a limited number of cases. In this paper, we study the Lipschitz regularity of feature maps associated with integral kernels under differentiability assumptions. We first provide sufficient conditions ensuring Lipschitz continuity and derive explicit formulas for the corresponding Lipschitz constants. We then identify a condition under which the feature map fails to be Lipschitz continuous and apply these results to several important classes of kernels. For infinite width two-layer neural network with isotropic Gaussian weight distributions, we show that the Lipschitz constant of the associated kernel can be expressed as the supremum of a two-dimensional integral, leading to an explicit characterization for the Gaussian kernel and the ReLU random neural network kernel. We also study continuous and shift-invariant kernels such as Gaussian, Laplace, and Matérn kernels, which admit an interpretation as neural network with cosine activation function. In this setting, we prove that the feature map is Lipschitz continuous if and only if the weight distribution has a finite second-order moment, and we then derive its Lipschitz constant. Finally, we raise an open question concerning the asymptotic behavior of the convergence of the Lipschitz constant in finite width neural networks. Numerical experiments are provided to support this behavior.


Refined Detection for Gumbel Watermarking

Lattimore, Tor

arXiv.org Machine Learning

We propose a simple detection mechanism for the Gumbel watermarking scheme proposed by Aaronson (2022). The new mechanism is proven to be near-optimal in a problem-dependent sense among all model-agnostic watermarking schemes under the assumption that the next-token distribution is sampled i.i.d.


Calorimeter Shower Superresolution with Conditional Normalizing Flows: Implementation and Statistical Evaluation

Cosso, Andrea

arXiv.org Machine Learning

In High Energy Physics, detailed calorimeter simulations and reconstructions are essential for accurate energy measurements and particle identification, but their high granularity makes them computationally expensive. Developing data-driven techniques capable of recovering fine-grained information from coarser readouts, a task known as calorimeter superresolution, offers a promising way to reduce both computational and hardware costs while preserving detector performance. This thesis investigates whether a generative model originally designed for fast simulation can be effectively applied to calorimeter superresolution. Specifically, the model proposed in arXiv:2308.11700 is re-implemented independently and trained on the CaloChallenge 2022 dataset based on the Geant4 Par04 calorimeter geometry. Finally, the model's performance is assessed through a rigorous statistical evaluation framework, following the methodology introduced in arXiv:2409.16336, to quantitatively test its ability to reproduce the reference distributions.


Topological Detection of Hopf Bifurcations via Persistent Homology: A Functional Criterion from Time Series

Barrios, Jhonathan, Echávez, Yásser, Álvarez, Carlos F.

arXiv.org Machine Learning

We propose a topological framework for the detection of Hopf bifurcations directly from time series, based on persistent homology applied to phase space reconstructions via Takens embedding within the framework of Topological Data Analysis. The central idea is that changes in the dynamical regime are reflected in the emergence or disappearance of a dominant one-dimensional homological features in the reconstructed attractor. To quantify this behavior, we introduce a simple and interpretable scalar topological functional defined as the maximum persistence of homology classes in dimension one. This functional is used to construct a computable criterion for identifying critical parameters in families of dynamical systems without requiring knowledge of the underlying equations. The proposed approach is validated on representative systems of increasing complexity, showing consistent detection of the bifurcation point. The results support the interpretation of dynamical transitions as topological phase transitions and demonstrate the potential of topological data analysis as a model-free tool for the quantitative analysis of nonlinear time series.


Unfolding with a Wasserstein Loss

Craig, Katy, Faktor, Benjamin, Nachman, Benjamin

arXiv.org Machine Learning

Data unfolding -- the removal of noise or artifacts from measurements -- is a fundamental task across the experimental sciences. Of particular interest in the present work are applications of data unfolding in physics, in which context the dominant approach is RichardsonLucy (RL) deconvolution. The classical RL approach aims to find denoised data that, once passed through the noise model, is as close as possible to the measured data, in terms of Kullback-Leibler (KL) divergence. Fundamental to this approach is the hypothesis that the support of the measured data overlaps with the output of the noise model, so that the KL divergence correctly captures their similarity. In practice, this hypothesis is typically enforced by binning the measured data and noise model, introducing numerical error into the unfolding process. As a counterpoint to classical binned methods for unfolding, the present work studies an alternative formulation of the unfolding problem, using a Wasserstein loss instead of the KL divergence to quantify the similarity between the measured data and the output of the noise model. We establish sharp conditions for existence and uniqueness of optimizers; as a consequence we answer open questions of Li, et al. [23], regarding necessary conditions for existence and uniqueness in the case of transport map noise models. Following these theoretical results, we then develop a provably convergent generalized Sinkhorn algorithm to compute approximate optimizers. Our algorithm requires only empirical observations of the noise model and measured data and scales with the size of the data, rather than the ambient dimension.