AITopics

2602.03202

Country:

North America > United States > Illinois > Cook County > Chicago (0.24)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Zeng, Jice, Barajas-Solano, David, Chen, Hui

Generative AI-enhanced Probabilistic Multi-Fidelity Surrogate Modeling Via Transfer Learning

The performance of machine learning surrogates is critically dependent on data quality and quantity. This presents a major challenge, as high-fidelity (HF) data is often scarce and computationally expensive to acquire, while low-fidelity (LF) data is abundant but less accurate. To address this data-scarcity problem, we develop a probabilistic multi-fidelity surrogate framework based on generative transfer learning. We employ a normalizing flow (NF) generative model as the backbone, which is trained in two phases: (i) the NF is first pretrained on a large LF dataset to learn a probabilistic forward model; (ii) the pretrained model is then fine-tuned on a small HF dataset, allowing it to correct for LF-HF discrepancies via knowledge transfer. To relax the dimension-preserving constraint of standard bijective NFs, we integrate surjective (dimension-reducing) layers with standard coupling blocks. This architecture enables learned dimension reduction while preserving the ability to train with exact likelihoods. The resulting surrogate provides fast probabilistic predictions with quantified uncertainty and significantly outperforms LF-only baselines while using fewer HF evaluations. We validate the approach on a reinforced concrete slab benchmark, combining many coarse-mesh (LF) simulations with a limited set of fine-mesh (HF) simulations. The proposed model achieves probabilistic predictions with HF accuracy, demonstrating a practical path toward data-efficient, generative AI-driven surrogates for complex engineering systems. Email address: David.Barajas-Solano@pnnl.gov (David Barajas-Solano) Introduction High-fidelity (HF) computer modeling using discretization schemes such as the finite elements (FE) method provides a rigorous framework for analyzing and predicting the behavior of complex engineering systems.

artificial intelligence, machine learning, natural language, (21 more...)

2602.00072

Country:

Asia > China > Hubei Province > Wuhan (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Materials > Construction Materials (0.69)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

Durmus, Alain, Noble, Maxence, Pellerin, Thibaut

Sampling from multi-modal distributions on Riemannian manifolds with training-free stochastic interpolants

In this paper, we propose a general methodology for sampling from un-normalized densities defined on Riemannian manifolds, with a particular focus on multi-modal targets that remain challenging for existing sampling methods. Inspired by the framework of diffusion models developed for generative modeling, we introduce a sampling algorithm based on the simulation of a non-equilibrium deterministic dynamics that transports an easy-to-sample noise distribution toward the target. At the marginal level, the induced density path follows a prescribed stochastic interpolant between the noise and target distributions, specifically constructed to respect the underlying Riemannian geometry. In contrast to related generative modeling approaches that rely on machine learning, our method is entirely training-free. It instead builds on iterative posterior sampling procedures using only standard Monte Carlo techniques, thereby extending recent diffusion-based sampling methodologies beyond the Euclidean setting. We complement our approach with a rigorous theoretical analysis and demonstrate its effectiveness on a range of multi-modal sampling problems, including high-dimensional and heavy-tailed examples.

artificial intelligence, machine learning, manifold, (19 more...)

2602.00641

Country:

Europe > Austria > Vienna (0.14)
Europe > France (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Data Science (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Daudel, Kamélia, Tran, Minh-Ngoc, Zhang, Cheng

Importance Weighted Variational Inference without the Reparameterization Trick

Importance weighted variational inference (VI) approximates densities known up to a normalizing constant by optimizing bounds that tighten with the number of Monte Carlo samples $N$. Standard optimization relies on reparameterized gradient estimators, which are well-studied theoretically yet restrict both the choice of the data-generating process and the variational approximation. While REINFORCE gradient estimators do not suffer from such restrictions, they lack rigorous theoretical justification. In this paper, we provide the first comprehensive analysis of REINFORCE gradient estimators in importance weighted VI, leveraging this theoretical foundation to diagnose and resolve fundamental deficiencies in current state-of-the-art estimators. Specifically, we introduce and examine a generalized family of variational inference for Monte Carlo objectives (VIMCO) gradient estimators. We prove that state-of-the-art VIMCO gradient estimators exhibit a vanishing signal-to-noise ratio (SNR) as $N$ increases, which prevents effective optimization. To overcome this issue, we propose the novel VIMCO-$\star$ gradient estimator and show that it averts the SNR collapse of existing VIMCO gradient estimators by achieving a $\sqrt{N}$ SNR scaling instead. We demonstrate its superior empirical performance compared to current VIMCO implementations in challenging settings where reparameterized gradients are typically unavailable.

artificial intelligence, estimator, machine learning, (18 more...)

2602.01412

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Guilhoto, Leonardo Ferreira, Kaushal, Akshat, Perdikaris, Paris

Multimodal Scientific Learning Beyond Diffusions and Flows

Scientific machine learning (SciML) increasingly requires models that capture multimodal conditional uncertainty arising from ill-posed inverse problems, multistability, and chaotic dynamics. While recent work has favored highly expressive implicit generative models such as diffusion and flow-based methods, these approaches are often data-hungry, computationally costly, and misaligned with the structured solution spaces frequently found in scientific problems. We demonstrate that Mixture Density Networks (MDNs) provide a principled yet largely overlooked alternative for multimodal uncertainty quantification in SciML. As explicit parametric density estimators, MDNs impose an inductive bias tailored to low-dimensional, multimodal physics, enabling direct global allocation of probability mass across distinct solution branches. This structure delivers strong data efficiency, allowing reliable recovery of separated modes in regimes where scientific data is scarce. We formalize these insights through a unified probabilistic framework contrasting explicit and implicit distribution networks, and demonstrate empirically that MDNs achieve superior generalization, interpretability, and sample efficiency across a range of inverse, multistable, and chaotic scientific regression tasks.

artificial intelligence, machine learning, multimodal scientific learning, (16 more...)

2602.0096

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.81)

Industry:

Energy (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)

McAlinn, Kenichiro, Takanashi, Kōsaku

When Is Generalized Bayes Bayesian? A Decision-Theoretic Characterization of Loss-Based Updating

Loss-based updating, including generalized Bayes, Gibbs, and quasi-posteriors, replaces likelihoods by a user-chosen loss and produces a posterior-like distribution via exponential tilt. We give a decision-theoretic characterization that separates \emph{belief posteriors} -- conditional beliefs justified by the foundations of Savage and Anscombe-Aumann under a joint probability mode l-- from \emph{decision posteriors} -- randomized decision rules justified by preferences over decision rules. We make explicit that a loss-based posterior coincides with ordinary Bayes if and only if the loss is, up to scale and a data-only term, negative log-likelihood. We then show that generalized marginal likelihood is not evidence for decision posteriors, and Bayes factors are not well-defined without additional structure. In the decision posterior regime, non-degenerate posteriors require nonlinear preferences over decision rules. Under sequential coherence and separability, these lead to an entropy-penalized variational representation yielding generalized Bayes as the optimal rule.

artificial intelligence, machine learning, posterior, (19 more...)

2602.01573

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Schlesinger, Dmitrij, Flach, Boris, Shekhovtsov, Alexander

Deep Multivariate Models with Parametric Conditionals

We consider deep multivariate models for heterogeneous collections of random variables. In the context of computer vision, such collections may e.g. consist of images, segmentations, image attributes, and latent variables. When developing such models, most existing works start from an application task and design the model components and their dependencies to meet the needs of the chosen task. This has the disadvantage of limiting the applicability of the resulting model for other downstream tasks. Here, instead, we propose to represent the joint probability distribution by means of conditional probability distributions for each group of variables conditioned on the rest. Such models can then be used for practically any possible downstream task. Their learning can be approached as training a parametrised Markov chain kernel by maximising the data likelihood of its limiting distribution. This has the additional advantage of allowing a wide range of semi-supervised learning scenarios.

artificial intelligence, experiment, machine learning, (18 more...)

2602.01953

Country: Europe > Czechia > Prague (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Carlotti, Pietro, Gligić, Nevena, Farahi, Arya

Density-Informed Pseudo-Counts for Calibrated Evidential Deep Learning

Evidential Deep Learning (EDL) is a popular framework for uncertainty-aware classification that models predictive uncertainty via Dirichlet distributions parameterized by neural networks. Despite its popularity, its theoretical foundations and behavior under distributional shift remain poorly understood. In this work, we provide a principled statistical interpretation by proving that EDL training corresponds to amortized variational inference in a hierarchical Bayesian model with a tempered pseudo-likelihood. This perspective reveals a major drawback: standard EDL conflates epistemic and aleatoric uncertainty, leading to systematic overconfidence on out-of-distribution (OOD) inputs. To address this, we introduce Density-Informed Pseudo-count EDL (DIP-EDL), a new parametrization that decouples class prediction from the magnitude of uncertainty by separately estimating the conditional label distribution and the marginal covariate density. This separation preserves evidence in high-density regions while shrinking predictions toward a uniform prior for OOD data. Theoretically, we prove that DIP-EDL achieves asymptotic concentration. Empirically, we show that our method enhances interpretability and improves robustness and uncertainty calibration under distributional shift.

artificial intelligence, bayesian inference, machine learning, (15 more...)

2602.01477

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Zero-Flow Encoders

Wang, Yakun, Wang, Leyang, Liu, Song, Suzuki, Taiji

Flow-based methods have achieved significant success in various generative modeling tasks, capturing nuanced details within complex data distributions. However, few existing works have exploited this unique capability to resolve fine-grained structural details beyond generation tasks. This paper presents a flow-inspired framework for representation learning. First, we demonstrate that a rectified flow trained using independent coupling is zero everywhere at $t=0.5$ if and only if the source and target distributions are identical. We term this property the \emph{zero-flow criterion}. Second, we show that this criterion can certify conditional independence, thereby extracting \emph{sufficient information} from the data. Third, we translate this criterion into a tractable, simulation-free loss function that enables learning amortized Markov blankets in graphical models and latent representations in self-supervised learning tasks. Experiments on both simulated and real-world datasets demonstrate the effectiveness of our approach. The code reproducing our experiments can be found at: https://github.com/probabilityFLOW/zfe.

data mining, machine learning, natural language, (19 more...)

2602.00797

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States (0.14)
Europe > United Kingdom > England > Bristol (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

On the Power of (Approximate) Reward Models for Inference-Time Scaling

Zhu, Youheng, Lu, Yiping

Inference-time scaling has recently emerged as a powerful paradigm for improving the reasoning capability of large language models. Among various approaches, Sequential Monte Carlo (SMC) has become a particularly important framework, enabling iterative generation, evaluation, rejection, and resampling of intermediate reasoning trajectories. A central component in this process is the reward model, which evaluates partial solutions and guides the allocation of computation during inference. However, in practice, true reward models are never available. All deployed systems rely on approximate reward models, raising a fundamental question: Why and when do approximate reward models suffice for effective inference-time scaling? In this work, we provide a theoretical answer. We identify the Bellman error of the approximate reward model as the key quantity governing the effectiveness of SMC-based inference-time scaling. For a reasoning process of length $T$, we show that if the Bellman error of the approximate reward model is bounded by $O(1/T)$, then combining this reward model with SMC reduces the computational complexity of reasoning from exponential in $T$ to polynomial in $T$. This yields an exponential improvement in inference efficiency despite using only approximate rewards.

machine learning, natural language, proposal, (17 more...)

2602.01381

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)