Goto

Collaborating Authors

 Bayesian Learning


Bayesian Deep Learning for Convective Initiation Nowcasting Uncertainty Estimation

arXiv.org Artificial Intelligence

This study evaluated the probability and uncertainty forecasts of five recently proposed Bayesian deep learning methods relative to a deterministic residual neural network (ResNet) baseline for 0-1 h convective initiation (CI) nowcasting using GOES-16 satellite infrared observations. Uncertainty was assessed by how well probabilistic forecasts were calibrated and how well uncertainty separated forecasts with large and small errors. Most of the Bayesian deep learning methods produced probabilistic forecasts that outperformed the deterministic ResNet, with one, the initial-weights ensemble + Monte Carlo (MC) dropout, an ensemble of deterministic ResNets with different initial weights to start training and dropout activated during inference, producing the most skillful and well-calibrated forecasts. The initial-weights ensemble + MC dropout benefited from generating multiple solutions that more thoroughly sampled the hypothesis space. The Bayesian ResNet ensemble was the only one that performed worse than the deterministic ResNet at longer lead times, likely due to the challenge of optimizing a larger number of parameters. To address this issue, the Bayesian-MOPED (MOdel Priors with Empirical Bayes using Deep neural network) ResNet ensemble was adopted, and it enhanced forecast skill by constraining the hypothesis search near the deterministic ResNet hypothesis. All Bayesian methods demonstrated well-calibrated uncertainty and effectively separated cases with large and small errors. In case studies, the initial-weights ensemble + MC dropout demonstrated better forecast skill than the Bayesian-MOPED ensemble and the deterministic ResNet on selected CI events in clear-sky regions. However, the initial-weights ensemble + MC dropout exhibited poorer generalization in clear-sky and anvil cloud regions without CI occurrence compared to the deterministic ResNet and Bayesian-MOPED ensemble.


Recursive Equations For Imputation Of Missing Not At Random Data With Sparse Pattern Support

arXiv.org Artificial Intelligence

A common approach for handling missing values in data analysis pipelines is multiple imputation via software packages such as MICE (Van Buuren and Groothuis-Oudshoorn, 2011) and Amelia (Honaker et al., 2011). These packages typically assume the data are missing at random (MAR), and impose parametric or smoothing assumptions upon the imputing distributions in a way that allows imputation to proceed even if not all missingness patterns have support in the data. Such assumptions are unrealistic in practice, and induce model misspecification bias on any analysis performed after such imputation. In this paper, we provide a principled alternative. Specifically, we develop a new characterization for the full data law in graphical models of missing data. This characterization is constructive, is easily adapted for the calculation of imputation distributions for both MAR and MNAR (missing not at random) mechanisms, and is able to handle lack of support for certain patterns of missingness. We use this characterization to develop a new imputation algorithm -- Multivariate Imputation via Supported Pattern Recursion (MISPR) -- which uses Gibbs sampling, by analogy with the Multivariate Imputation with Chained Equations (MICE) algorithm, but which is consistent under both MAR and MNAR settings, and is able to handle missing data patterns with no support without imposing additional assumptions beyond those already imposed by the missing data model itself. In simulations, we show MISPR obtains comparable results to MICE when data are MAR, and superior, less biased results when data are MNAR. Our characterization and imputation algorithm based on it are a step towards making principled missing data methods more practical in applied settings, where the data are likely both MNAR and sufficiently high dimensional to yield missing data patterns with no support at available sample sizes.


Semantic-Aware Gaussian Process Calibration with Structured Layerwise Kernels for Deep Neural Networks

arXiv.org Artificial Intelligence

Calibrating the confidence of neural network classifiers is essential for quantifying the reliability of their predictions during inference. However, conventional Gaussian Process (GP) calibration methods often fail to capture the internal hierarchical structure of deep neural networks, limiting both interpretability and effectiveness for assessing predictive reliability. We propose a Semantic-Aware Layer-wise Gaussian Process (SAL-GP) framework that mirrors the layered architecture of the target neural network. Instead of applying a single global GP correction, SAL-GP employs a multi-layer GP model, where each layer's feature representation is mapped to a local calibration correction. These layerwise GPs are coupled through a structured multi-layer kernel, enabling joint marginalization across all layers. This design allows SAL-GP to capture both local semantic dependencies and global calibration coherence, while consistently propagating predictive uncertainty through the network. The resulting framework enhances interpretability aligned with the network architecture and enables principled evaluation of confidence consistency and uncertainty quantification in deep models.


An open dataset of neural networks for hypernetwork research

arXiv.org Artificial Intelligence

Despite the transformative potential of AI, the concept of neural networks that can produce other neural networks by generating model weights (hypernetworks) has been largely understudied. One of the possible reasons is the lack of available research resources that can be used for the purpose of hypernetwork research. Here we describe a dataset of neural networks, designed for the purpose of hypernetworks research. The dataset includes $10^4$ LeNet-5 neural networks trained for binary image classification separated into 10 classes, such that each class contains 1,000 different neural networks that can identify a certain ImageNette V2 class from all other classes. A computing cluster of over $10^4$ cores was used to generate the dataset. Basic classification results show that the neural networks can be classified with accuracy of 72.0%, indicating that the differences between the neural networks can be identified by supervised machine learning algorithms. The ultimate purpose of the dataset is to enable hypernetworks research. The dataset and the code that generates it are open and accessible to the public.


SenWiCh: Sense-Annotation of Low-Resource Languages for WiC using Hybrid Methods

arXiv.org Artificial Intelligence

This paper addresses the critical need for high-quality evaluation datasets in low-resource languages to advance cross-lingual transfer. While cross-lingual transfer offers a key strategy for leveraging multilingual pretraining to expand language technologies to understudied and typologically diverse languages, its effectiveness is dependent on quality and suitable benchmarks. We release new sense-annotated datasets of sentences containing polysemous words, spanning ten low-resource languages across diverse language families and scripts. To facilitate dataset creation, the paper presents a demonstrably beneficial semi-automatic annotation method. The utility of the datasets is demonstrated through Word-in-Context (WiC) formatted experiments that evaluate transfer on these low-resource languages. Results highlight the importance of targeted dataset creation and evaluation for effective polysemy disambiguation in low-resource settings and transfer studies. The released datasets and code aim to support further research into fair, robust, and truly multilingual NLP.


Adaptive Bayesian Single-Shot Quantum Sensing

arXiv.org Artificial Intelligence

Quantum sensing harnesses the unique properties of quantum systems to enable precision measurements of physical quantities such as time, magnetic and electric fields, acceleration, and gravitational gradients well beyond the limits of classical sensors. However, identifying suitable sensing probes and measurement schemes can be a classically intractable task, as it requires optimizing over Hilbert spaces of high dimension. In variational quantum sensing, a probe quantum system is generated via a parameterized quantum circuit (PQC), exposed to an unknown physical parameter through a quantum channel, and measured to collect classical data. PQCs and measurements are typically optimized using offline strategies based on frequentist learning criteria. This paper introduces an adaptive protocol that uses Bayesian inference to optimize the sensing policy via the maximization of the active information gain. The proposed variational methodology is tailored for non-asymptotic regimes where a single probe can be deployed in each time step, and is extended to support the fusion of estimates from multiple quantum sensing agents.


Assessing Adaptive World Models in Machines with Novel Games

arXiv.org Artificial Intelligence

Human intelligence exhibits a remarkable capacity for rapid adaptation and effective problem-solving in novel and unfamiliar contexts. We argue that this profound adaptability is fundamentally linked to the efficient construction and refinement of internal representations of the environment, commonly referred to as world models, and we refer to this adaptation mechanism as world model induction . However, current understanding and evaluation of world models in artificial intelligence (AI) remains narrow, often focusing on static representations learned from training on massive corpora of data, instead of the efficiency and efficacy in learning these representations through interaction and exploration within a novel environment. In this Perspective, we provide a view of world model induction drawing on decades of research in cognitive science on how humans learn and adapt so efficiently; we then call for a new evaluation framework for assessing adaptive world models in AI. Concretely, we propose a new benchmarking paradigm based on suites of carefully designed games with genuine, deep and continually refreshing novelty in the underlying game structures -- we refer to this class of games as novel games . We detail key desiderata for constructing these games and propose appropriate metrics to explicitly challenge and evaluate the agent's ability for rapid world model induction. We hope that this new evaluation framework will inspire future evaluation efforts on world models in AI and provide a crucial step towards developing AI systems capable of human-like rapid adaptation and robust generalization -- a critical component of artificial general intelligence.


Accelerated Bayesian Optimal Experimental Design via Conditional Density Estimation and Informative Data

arXiv.org Machine Learning

The Design of Experiments (DOEs) is a fundamental scientific methodology that provides researchers with systematic principles and techniques to enhance the validity, reliability, and efficiency of experimental outcomes. In this study, we explore optimal experimental design within a Bayesian framework, utilizing Bayes' theorem to reformulate the utility expectation--originally expressed as a nested double integral--into an independent double integral form, significantly improving numerical efficiency. To further accelerate the computation of the proposed utility expectation, conditional density estimation is employed to approximate the ratio of two Gaussian random fields, while covariance serves as a selection criterion to identify informative data-set during model fitting and integral evaluation. In scenarios characterized by low simulation efficiency and high costs of raw data acquisition, key challenges such as surrogate modeling, failure probability estimation, and parameter inference are systematically restructured within the Bayesian experimental design framework. The effectiveness of the proposed methodology is validated through both theoretical analysis and practical applications, demonstrating its potential for enhancing experimental efficiency and decision-making under uncertainty.


Recent Advances in Simulation-based Inference for Gravitational Wave Data Analysis

arXiv.org Machine Learning

The detection of gravitational waves by the LIGO-Virgo-KAGRA collaboration has ushered in a new era of observational astronomy, emphasizing the need for rapid and detailed parameter estimation and population-level analyses. Traditional Bayesian inference methods, particularly Markov chain Monte Carlo, face significant computational challenges when dealing with the high-dimensional parameter spaces and complex noise characteristics inherent in gravitational wave data. This review examines the emerging role of simulation-based inference methods in gravitational wave astronomy, with a focus on approaches that leverage machine-learning techniques such as normalizing flows and neural posterior estimation. We provide a comprehensive overview of the theoretical foundations underlying various simulation-based inference methods, including neural posterior estimation, neural ratio estimation, neural likelihood estimation, flow matching, and consistency models. We explore the applications of these methods across diverse gravitational wave data processing scenarios, from single-source parameter estimation and overlapping signal analysis to testing general relativity and conducting population studies. Although these techniques demonstrate speed improvements over traditional methods in controlled studies, their model-dependent nature and sensitivity to prior assumptions are barriers to their widespread adoption. Their accuracy, which is similar to that of conventional methods, requires further validation across broader parameter spaces and noise conditions.


Accelerating Hamiltonian Monte Carlo for Bayesian Inference in Neural Networks and Neural Operators

arXiv.org Machine Learning

Hamiltonian Monte Carlo (HMC) is a powerful and accurate method to sample from the posterior distribution in Bayesian inference. However, HMC techniques are computationally demanding for Bayesian neural networks due to the high dimensionality of the network's parameter space and the non-convexity of their posterior distributions. Therefore, various approximation techniques, such as variational inference (VI) or stochastic gradient MCMC, are often employed to infer the posterior distribution of the network parameters. Such approximations introduce inaccuracies in the inferred distributions, resulting in unreliable uncertainty estimates. In this work, we propose a hybrid approach that combines inexpensive VI and accurate HMC methods to efficiently and accurately quantify uncertainties in neural networks and neural operators. The proposed approach leverages an initial VI training on the full network. We examine the influence of individual parameters on the prediction uncertainty, which shows that a large proportion of the parameters do not contribute substantially to uncertainty in the network predictions. This information is then used to significantly reduce the dimension of the parameter space, and HMC is performed only for the subset of network parameters that strongly influence prediction uncertainties. This yields a framework for accelerating the full batch HMC for posterior inference in neural networks. We demonstrate the efficiency and accuracy of the proposed framework on deep neural networks and operator networks, showing that inference can be performed for large networks with tens to hundreds of thousands of parameters. We show that this method can effectively learn surrogates for complex physical systems by modeling the operator that maps from upstream conditions to wall-pressure data on a cone in hypersonic flow.