monte carlo
Corrected Integrated Laplace Approximation for Bayesian Inference in Latent Gaussian Models
Lai, Jinlin, Margossian, Charles C., Sheldon, Daniel R.
Latent Gaussian models (LGMs) are a popular class of Bayesian hierarchical models that include Gaussian processes, as well as certain spatial models and mixed-effect models. Efficient Bayesian inference of LGMs often requires marginalizing out the latent variables. For LGMs with a non-Gaussian likelihood, exact marginalization is not possible and a popular approach is to do approximate marginalization with an integrated Laplace approximation (ILA). Using ILA produces an approximate posterior which, in some settings, can differ significantly from the correct posterior, which impacts downstream applications. We propose an importance sampling scheme to correct the error introduced by ILA. By increasing the number of samples in importance sampling, the posterior with ILA converges to the correct posterior. This idea is realized with various techniques, including pseudo-marginalization, quasi-Monte Carlo and randomized quasi-Monte Carlo. We implement our methods in an automatic differentiation framework to support gradient-based algorithms when doing inference on the hyperparameters. For the latter, we specifically consider the use of Hamiltonian Monte Carlo. We demonstrate the benefits of reduced error in various applied models.
Your SaaS Is an Insurance Product: A Modeling Framework
Capped-usage SaaS products -- LLM subscriptions such as Claude Code and ChatGPT, cloud platforms such as Vercel and Cloudflare Workers, corporate benefit platforms, identity-verification services with liability transfer -- share a structural signature with insurance products: a fixed premium decoupled from realized consumption, stochastic per-user demand with heavy-tailed severity, a non-fungible cap that resets on a fixed schedule, and a portfolio-level exposure that requires reserve adequacy under tail risk. We argue that this is not an analogy. It is the same operational problem actuarial science has been tooled for decades to address, restated with new dependent variables (tokens, bandwidth bytes, function-invocations, gym check-ins) in place of medical claims. This paper proposes a modeling framework for capped-usage SaaS pricing built from frequency-severity decomposition, premium calculation principles, and Monte Carlo reserve adequacy. We map the framework to publicly observable subscription tiers in two domains (LLM services and cloud platforms), ground it in canonical health-insurance economics (Arrow 1963; Pauly 1968; Manning et al. 1987; Brot-Goldberg et al. 2017), and demonstrate divergence from traditional unit economics through a worked example. The contribution is operational rather than theoretical: not a new theorem, but vocabulary and tools currently absent from cs.LG/stat.ML practice.
Estimating the expected output of wide random MLPs more efficiently than sampling
Wu, Wilson, Lecomte, Victor, Winer, Michael, Robinson, George, Hilton, Jacob, Christiano, Paul
By far the most common way to estimate an expected loss in machine learning is to draw samples, compute the loss on each one, and take the empirical average. However, sampling is not necessarily optimal. Given an MLP at initialization, we show how to estimate its expected output over Gaussian inputs without running samples through the network at all. Instead, we produce approximate representations of the distributions of activations at each layer, leveraging tools such as cumulants and Hermite expansions. We show both theoretically and empirically that for sufficiently wide networks, our estimator achieves a target mean squared error using substantially fewer FLOPs than Monte Carlo sampling. We find moreover that our methods perform particularly well at estimating the probabilities of rare events, and additionally demonstrate how they can be used for model training. Together, these findings suggest a path to producing models with a greatly reduced probability of catastrophic tail risks.
Convex-Geometric Error Bounds for Positive-Weight Kernel Quadrature
Kernel quadrature (KQ) is a kernel-based approach to numerical integration, closely related to Bayesian quadrature (BQ) and probabilistic integration [38, 39, 10]. For sufficiently regular integrands, KQ can exploit spectral structure in a reproducing kernel Hilbert space (RKHS) that is invisible to plain Monte Carlo and thereby converge faster than the usual O(N 1/2) rate in the number of points [3, 28]. Unconstrained kernel-based rules, however, may produce numerically unstable weights, motivating longstanding interest in positively weighted rules [13, 21, 29, 46]. In this paper, positive weights mean nonnegative weights that sum to one, i.e., simplex or convex-combination weights. Whether positive-weight KQ can systematically improve over Monte Carlo is a subtle question. Kernel herding and related constructions suggested fast rates under favorable assumptions [13], but the conditional-gradient viewpoint of Bach et al. [4] clarified that the strongest such assumptions are not generally available in infinite-dimensional RKHSs. Subsequent herding-type analyses in broad RKHS settings have therefore mostly remained at the Monte-Carlo scale, except under additional structure or modified algorithms such as sparse herding variants [31, 44, 43]. Beyond herding, subsampling-based positive KQ methods such as thinning [16, 15] and recombination [21, 24] have obtained rates beyond Monte Carlo, but a general mechanism for such improvement in the simple i.i.d.
A Geometry-Aware Residual Correction of Hagan's SABR Implied Volatility Formula
Reghai, Adil, Tarsissi, Lama, Biau, Gérard, Lipton, Alex
This paper proposes a hybrid methodology to improve the approximation of SABR (Stochastic Alpha Beta Rho) implied volatility by combining analytical structure with machine learning. The approach augments the neural-network input representation with geometric features derived from the stochastic differential equations of the SABR model. Unlike approaches that fully replace analytical formulas with black-box models, the proposed framework preserves the analytical backbone of the model. The hybridization operates along two complementary dimensions. First, geometry-aware variables reflecting intrinsic properties of the SABR dynamics are used as structured inputs to the network. Second, the neural network is trained to learn the residual error relative to Hagan's closed-form approximation rather than implied volatility directly. The resulting model acts as a structured residual correction to the analytical formula, retaining interpretability while capturing higher-order effects that are not included in the asymptotic expansion. Numerical experiments conducted over realistic parameter domains, as well as stressed environments, show that the method improves accuracy and robustness compared with both analytical approximations and standard neural-network approaches. Because the correction remains lightweight and structurally consistent with the underlying model, the framework is well suited for real-time pricing and calibration in practical trading environments.
Amortized Variational Inference for Joint Posterior and Predictive Distributions in Bayesian Uncertainty Quantification
Bayesian predictive inference propagates parameter uncertainty to quantities of interest through the posterior-predictive distribution. In practice, this is typically performed using a two-stage procedure: first approximating the posterior distribution of model parameters, and then propagating posterior samples through the predictive model via Monte Carlo simulation. This sequential workflow can be computationally demanding, particularly for high-fidelity models such as those governed by partial differential equations. We propose a variational Bayesian framework that directly targets the posterior-predictive distribution and jointly learns variational approximations of both the posterior and the corresponding predictive distribution. The formulation introduces a variational upper bound on the Kullback--Leibler divergence together with moment-based regularization terms. The variational distributions are trained in an amortized manner, shifting computational effort to an offline stage and enabling efficient online inference. Numerical experiments ranging from analytical benchmarks to a finite-element solid mechanics problem demonstrate that the proposed method achieves more accurate predictive distributions than conventional two-stage variational inference, while substantially reducing the cost of online predictive inference.
Decentralized Proximal Stochastic Gradient Langevin Dynamics
Islam, Mohammad Rafiqul, Zhu, Lingjiong
Decentralized learning is a learning process in which data is distributed across computational agents or collected by individual agents, and model parameters are computed as the consensus of the agents. It has gained a lot of interest for applications where agents can collaboratively learn a predictive model without sharing their own data, but sharing only their local models with their immediate neighbors to generate a global model [He et al., 2018, Hendrikx et al., 2019, Arjevani et al., 2020]. We assume there are N agents who are connected over an undirected communication network G = (V,E) where V = {1,...,N} represents the agents and E V V denotes the set of edges; i.e., if agent i and j are connected then (i,j) E implies (j,i) E. Suppose we have a collection of n independent and identically distributed (i.i.d.) data pairs zi = (ai,yi), where ai Rp is the feature vector and yi the label or response of the i-th observation. Let Z = [z1,z2,,zn] Rnp be sampled from the distribution p(Z|x) where the parameter x Rd has a common prior. The goal is to sample from the posterior distribution p(x|Z) p(Z|x)p(x) by distributing Z among N agents such that Zi = {zi1,zi2,,zini} is the subset of data exclusive to agent i.
Learning Rate Free Sampling in Constrained Domains
We introduce a suite of new particle-based algorithms for sampling in constrained domains which are entirely learning rate free. Our approach leverages coin betting ideas from convex optimisation, and the viewpoint of constrained sampling as a mirrored optimisation problem on the space of probability measures. Based on this viewpoint, we also introduce a unifying framework for several existing constrained sampling algorithms, including mirrored Langevin dynamics and mirrored Stein variational gradient descent. We demonstrate the performance of our algorithms on a range of numerical examples, including sampling from targets on the simplex, sampling with fairness constraints, and constrained sampling problems in postselection inference. Our results indicate that our algorithms achieve competitive performance with existing constrained sampling methods, without the need to tune any hyperparameters.
Langevin Quasi-Monte Carlo
Langevin Monte Carlo (LMC) and its stochastic gradient versions are powerful algorithms for sampling from complex high-dimensional distributions. To sample from a distribution with density π(θ) exp( U(θ)), LMC iteratively generates the next sample by taking a step in the gradient direction U with added Gaussian perturbations. Expectations w.r.t. the target distribution π are estimated by averaging over LMC samples. In ordinary Monte Carlo, it is well known that the estimation error can be substantially reduced by replacing independent random samples by quasi-random samples like low-discrepancy sequences. In this work, we show that the estimation error of LMC can also be reduced by using quasirandom samples. Specifically, we propose to use completely uniformly distributed (CUD) sequences with certain low-discrepancy property to generate the Gaussian perturbations. Under smoothness and convexity conditions, we prove that LMC with a low-discrepancy CUD sequence achieves smaller error than standard LMC. The theoretical analysis is supported by compelling numerical experiments, which demonstrate the effectiveness of our approach.