Goto

Collaborating Authors

 sinequality


SPDE Methods for Nonparametric Bayesian Posterior Contraction and Laplace Approximation

Alberola-Boloix, Enric, Casado-Telletxea, Ioar

arXiv.org Machine Learning

We derive posterior contraction rates (PCRs) and finite-sample Bernstein von Mises (BvM) results for non-parametric Bayesian models by extending the diffusion-based framework of Mou et al. (2024) to the infinite-dimensional setting. The posterior is represented as the invariant measure of a Langevin stochastic partial differential equation (SPDE) on a separable Hilbert space, which allows us to control posterior moments and obtain non-asymptotic concentration rates in Hilbert norms under various likelihood curvature and regularity conditions. We also establish a quantitative Laplace approximation for the posterior. The theory is illustrated in a nonparametric linear Gaussian inverse problem.


When Your Model Stops Working: Anytime-Valid Calibration Monitoring

Farran, Tristan

arXiv.org Machine Learning

Practitioners monitoring deployed probabilistic models face a fundamental trap: any fixed-sample test applied repeatedly over an unbounded stream will eventually raise a false alarm, even when the model remains perfectly stable. Existing methods typically lack formal error guarantees, conflate alarm time with changepoint location, and monitor indirect signals that do not fully characterize calibration. We present PITMonitor, an anytime-valid calibration-specific monitor that detects distributional shifts in probability integral transforms via a mixture e-process, providing Type I error control over an unbounded monitoring horizon as well as Bayesian changepoint estimation. On river's FriedmanDrift benchmark, PITMonitor achieves detection rates competitive with the strongest baselines across all three scenarios, although detection delay is substantially longer under local drift.







Appendix

Neural Information Processing Systems

In particular,SQuARM-SGD[45]can be viewed asCHOCO-SGD with momentum, but its theoretical convergence rate is slower than the originalCHOCO-SGD. We provide some examples of compression operators satisfying Definition 1 that are used in our experiments. Line 6), the penultimate line follows from W1 = 1, and the last line follows from the induction hypothesis at thet-th iteration. Line 3), in the second line we use the propertyofthemixingmatrix 1 W =1,andinthethirdline,weapplyYoung'sinequality(cf.(9)). Bounding Ωt2 in (14b) Similar to the derivation of (14a), by applying the update rule ofGt in BEER(Line 8),thedefinition ofcompression operators (Definition 1),andYoung'sinequality,we have It then boils down to establish (26).