AITopics

2605.20635

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Nguyen, Thien V., Habrard, Amaury, Guedj, Benjamin

A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning

Physics-informed machine learning (PIML) integrates mechanistic knowledge, typically in the form of partial differential equations (PDE), into data-driven models. Despite strong empirical performance, its statistical generalisation properties remain poorly understood, particularly in the regression setting with unbounded losses. Existing analyses rely on approximation or stability arguments and do not fully capture how physical structure influences generalisation from finite data. In this work, we develop a PAC-Bayesian framework for PIML that provides high-probability generalisation guarantees in the presence of unbounded losses. We adopt a multi-task perspective that jointly treats data fidelity, PDE residuals, initial and boundary conditions, avoiding the looseness induced by standard union-bound approaches. Our analysis leverages the structure of physics-informed objectives to derive novel bounds where the complexity scales with input-gradient norms of the losses, revealing a direct link between physical regularity and generalisation. We instantiate this framework under Sobolev and Poincaré-type assumptions, yielding two classes of bounds that trade off statistical complexity and smoothness in different regimes. Building on these results, we propose a self-bounding-aware learning algorithm that directly optimises tractable surrogates of the derived bounds, along with a practical procedure to estimate the associated constants in realistic settings. Empirical evaluations on standard PDE benchmarks demonstrate that our bounds are non-vacuous, significantly tighter than union-bound baselines, and can be effectively minimised during training. Overall, our results provide a principled statistical foundation for the generalisation of physics-informed models.

artificial intelligence, assumption, machine learning, (17 more...)

2605.26341

Country: Europe (1.00)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.84)

Constrained Bayesian Experimental Design via Online Planning

Guo, Yujia, Huang, Daolang, Zhang, Xinyu, Katt, Sammie, Kaski, Samuel, Bharti, Ayush

Bayesian experimental design (BED) is a principled framework for data-efficient design of sequential experiments. However, existing BED methods are unable to adapt to dynamic constraints inherent in real-world tasks due to budget limitations, varying costs, or physical constraints that restrict how designs evolve over time. In this paper, we introduce a novel approach to BED that enables constrained optimization of experimental designs by combining offline pre-training of an amortized policy and a posterior network with online multi-step lookahead planning using scenario trees. We empirically demonstrate that our method yields substantially more informative design sequences than existing methods across a range of constrained BED tasks, while incurring only a modest additional computational overhead.

artificial intelligence, machine learning, optimization problem, (13 more...)

2605.2699

Country: Europe (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Kang, Hyunmo, Levi, Noam Itzhak, Wegner, Corinna Elena, Korchinski, Daniel J., Wyart, Matthieu

Sampling Data with Chains of Forward-Backward Diffusion Steps

Sampling from learned high-dimensional distributions is a foundational computational problem. We introduce U-turn chains: Markov chains obtained by iterating short forward-backward steps of a diffusion model, in which each step proposes a move that remains on the learned data manifold and, paired with a Metropolis-Hastings correction, samples from energy-modified targets. For synthetic languages, we show that minimal U-turn dynamics undergoes an ergodicity-breaking phase transition driven by fragmentation of the data manifold; ergodicity is restored at larger U-turn magnitude. In the non-ergodic regime, low-level features relax faster than high-level ones, an ordering that inverts only at sufficiently large U-turn magnitude. We test these predictions on natural language and natural images. In both modalities, minimal U-turns relax slowly, especially for high-level features approximated by deep representations in CNNs or LLMs. The layer-ordering inversion appears only at large noise when mixing is efficient -- signatures consistent with strongly constrained, weakly mixing local dynamics. We discuss the implications of these results for sampling with diffusion models.

diffusion model, large language model, machine learning, (21 more...)

2605.27006

Country: North America > United States (0.67)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
(2 more...)

Warrior, Kane, Chakrabarty, Dalia

Gaussian Process-based learning with new MCMC-based implementation of Wishart prior on correlation matrix

Gaussian Process (GP) models are widely used as probabilistic models for nonlinear functions because they combine flexible function modelling with uncertainty quantification (Rasmussen and Williams, 2006; Williams, 1998; MacKay, 1992; Neal, 1995). Their predictive performance depends heavily on how kernel hyperparameters are learnt (Sundararajan and Keerthi, 2001). This becomes especially important in higher-dimensional multivariate settings, where many input-specific hyperparameters may be present and where only some inputs may contribute meaningful predictive structure (MacKay, 1992; Neal, 1995; Rasmussen and Williams, 2006; Linkletter et al., 2006; Paananen et al., 2019). In standard Bayesian formulations of GP learning, prior specification is usually imposed directly on kernel hyperparameters such as lengthscales, amplitude parameters, and noise terms (Rasmussen and Williams, 2006; Williams, 1998). This is natural from a modelling point of view, but it does not always give useful control over the covariance structure that those hyperparameters induce over the observed design points (Barnard et al., 2000; Gelman, 2006; Daniels and Kass, 1999; Huang and Wand, 2013). However, it is this induced covariance matrix that directly governs likelihood evaluation, numerical stability, and predictive behaviour (Rasmussen and Williams, 2006; Stein, 1999). 1

artificial intelligence, covariance matrix, machine learning, (17 more...)

2605.27093

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Liang, Yuchen, Shroff, Ness, Liang, Yingbin

From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models

Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, but, especially for uniform-rate models, they often require many steps to generate a single sample. Existing acceleration methods either rely on training additional quantities or suffer from slow mixing. In this work, we propose a novel Gibbs-based corrector for discrete diffusion models, termed Gibbs-Accelerated Discrete Diffusion (GADD). GADD leverages the structure of the concrete score function to construct Gibbs posterior likelihoods directly, without requiring any additional training beyond standard score estimation. We show that GADD achieves an overall sampling complexity of $\mathcal{O}(\mathrm{polylog} (\varepsilon^{-1}))$, yielding the first such rate for diffusion-based samplers for uniform-rate discrete diffusion models. We also conduct numerical experiments demonstrating the practical advantages of GADD across synthetic data, zero-shot text sampling, and zero-shot conditional music generation. These results corroborate the theory and show that GADD consistently improves sample quality and wall-clock efficiency over standard baselines, including vanilla Euler methods and CTMC correctors. Beyond this, our theoretical analysis introduces a novel framework for analyzing predictor-corrector methods in discrete diffusion models, which may be of independent interest. Unlike existing approaches that rely on the Girsanov change-of-measure technique, our method is based on an induction argument that tracks error propagation across predictor iterations while accounting for inaccuracies in the corrector updates.

diffusion model, large language model, machine learning, (20 more...)

2605.27352

Country: North America > United States (0.28)

Genre: Research Report (0.81)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Barnes, Leighton P., Dytso, Alex

Sub-Gaussian Concentration and Entropic Normality of the Maximum Likelihood Estimator

It is well known that, under standard regularity conditions, the maximum likelihood estimator (MLE) satisfies a central limit theorem and converges in distribution to a Gaussian random variable as the sample size grows. This paper strengthens this classical result by developing several stronger forms of asymptotic normality for the normalized MLE. With additional assumptions on the score, we first establish sub-Gaussian tail bounds and convergence of all moments for the normalized estimation error. We then prove an entropic central limit theorem for a smoothed version of the estimator, showing convergence in relative entropy to the limiting Gaussian law. When the Fisher information of the normalized estimate is bounded, or its density has bounded first derivative, we further show that the smoothing can be removed, yielding entropic normality of the MLE itself. The proofs develop auxiliary tools that may be of independent interest, including exponential consistency bounds, high-moment estimates, and entropy-control arguments for the estimator.

artificial intelligence, convergence, machine learning, (13 more...)

2605.07107

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.62)

Variance-Reduced Manifold Sampling via Polynomial-Maximization Density Estimation

Zabolotnii, Serhii

Uniform sampling on implicitly defined manifolds is a core primitive in motion planning, constrained simulation, and probabilistic machine learning. MASEM addresses this problem by entropy-maximizing resampling, but its resampling weights depend on a local k-nearest-neighbour density estimate whose errors can be amplified by aggressive resampling temperatures. We ask whether a polynomial-maximization moment estimator can replace the plug-in density rule without changing the surrounding MASEM architecture. The proposed PMM-MASEM module computes shell spacings from nested k-nearest-neighbour radii, estimates their standardized cumulants, and uses a gated PMM2/PMM3 estimator only when the spacing distribution departs from the flat Exp(1) regime; otherwise it falls back to the plug-in/MLE rule. This fallback is essential: on a flat homogeneous manifold the plug-in estimator is already the MLE, so PMM should not outperform it. A local Known-DGP Monte Carlo experiment confirms this gate: the selector returns MLE on flat Exp(1) spacings and reduces density MSE by 22--36% on asymmetric gamma and boundary-spacing regimes. The evidence is not uniformly positive: PMM3 worsens a platykurtic uniform spacing law, and a lightweight resampling-proxy experiment improves seven-lobes coverage but degrades the sine and swiss-roll proxies. The current evidence therefore supports an applicability-boundary result rather than a general MASEM improvement claim.

artificial intelligence, machine learning, manifold, (17 more...)

2605.19938

Country: Europe > Ukraine (0.47)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Optimal Non-Asymptotic Edgeworth Expansions for Multivariate Neural Network Outputs

Celli, Lucia

Finite-width fully connected neural networks with Gaussian-initialized weights deviate from their infinite-width Gaussian limit, exhibiting non-vanishing higher-order cumulants. We approximate these deviations, for a neural network evaluated in a finite number of inputs, using multidimensional Edgeworth expansions of arbitrary order $4m-1$, with $m\in\mathbb{N}$. Assuming that the corresponding Gaussian limit has an invertible covariance matrix and that the activation function is polynomially bounded, we establish a bound of order $n^{-m}$ on the total variation distance between the law of the true network output and its Edgeworth approximation, with matching lower bounds. As an application, we quantify the error in Bayesian posterior distributions when the prior is replaced by its Edgeworth expansion. Our results are more general and also apply to sequences of conditionally Gaussian vectors converging to a Gaussian vector with invertible covariance.

artificial intelligence, bayesian inference, machine learning, (16 more...)

2605.24072

Country: Europe > Luxembourg (0.40)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Niu, Chengmei, Garg, Sachin, Dereziński, Michał, Liao, Zhenyu

Debiasing Random Oblique Projections for Subsampled OLS and Fast CUR in High Dimensions

Random sampling is a fundamental tool in modern machine learning and numerical linear algebra for reducing the computational cost of large-scale matrix problems. Existing analyses, however, rely primarily on subspace embedding guarantees, which do not precisely characterize the statistical bias of nonlinear random oblique projections induced by sampling, which arises ubiquitously in subsampled least squares and fast low-rank approximation methods. Because (pseudo)inversion is nonlinear, these random oblique projections can be systematically biased even when the underlying sketch is unbiased, thereby introducing hidden bias into downstream least squares and low-rank approximation solutions. In this work, we develop a unified non-asymptotic theory for random oblique projections in high dimensions. We show that standard random sampling schemes generally induce a systematic statistical bias overlooked by classical subspace embedding-style analyses, and we propose a principled debiasing framework to correct it. We illustrate the power of the theory through two canonical applications. For subsampled least squares, we obtain sharp bias--variance characterizations, reveal previously unrecognized statistical suboptimality in widely used sampling schemes, and identify when debiasing yields provable improvements. For fast CUR decomposition, we develop a debiased approach with improved approximation accuracy. Numerical experiments further validate our theoretical findings.

artificial intelligence, machine learning, theorem 4, (19 more...)

2605.24955

Country: North America > United States (0.92)

Genre: Research Report (1.00)

Industry:

Transportation (0.68)
Consumer Products & Services > Travel (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.34)