Goto

Collaborating Authors

 Bayesian Learning


Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles

arXiv.org Machine Learning

Gaussian mixture models (GMMs) are ubiquitous in statistical learning, particularly for unsupervised problems. While full GMMs suffer from the overparameterization of their covariance matrices in high-dimensional spaces, spherical GMMs (with isotropic covariance matrices) certainly lack flexibility to fit certain anisotropic distributions. Connecting these two extremes, we introduce a new family of parsimonious GMMs with piecewise-constant covariance eigenvalue profiles. These extend several low-rank models like the celebrated mixtures of probabilistic principal component analyzers (MPPCA), by enabling any possible sequence of eigenvalue multiplicities. If the latter are prespecified, then we can naturally derive an expectation-maximization (EM) algorithm to learn the mixture parameters. Otherwise, to address the notoriously-challenging issue of jointly learning the mixture parameters and hyperparameters, we propose a componentwise penalized EM algorithm, whose monotonicity is proven. We show the superior likelihood-parsimony tradeoffs achieved by our models on a variety of unsupervised experiments: density fitting, clustering and single-image denoising.


Variational Digital Twins

arXiv.org Artificial Intelligence

While digital twins (DT) hold promise for providing real-time insights into complex energy assets, much of the current literature either does not offer a clear framework for information exchange between the model and the asset, lacks key features needed for real-time implementation, or gives limited attention to model uncertainty. Here, we aim to solve these gaps by proposing a variational digital twin (VDT) framework that augments standard neural architectures with a single Bayesian output layer. This lightweight addition, along with a novel VDT updating algorithm, lets a twin update in seconds on commodity GPUs while producing calibrated uncertainty bounds that can inform experiment design, control algorithms, and model reliability. The VDT is evaluated on four energy-sector problems. For critical-heat-flux prediction, uncertainty-driven active learning reaches R2 = 0.98 using 47 % fewer experiments and one-third the training time of random sampling. A three-year renewable-generation twin maintains R2 > 0.95 for solar output and curbs error growth for volatile wind forecasts via monthly updates that process only one month of data at a time. A nuclear reactor transient cooldown twin reconstructs thermocouple signals with R2 > 0.99 and preserves accuracy after 50 % sensor loss, demonstrating robustness to degraded instrumentation. Finally, a physics-informed Li-ion battery twin, retrained after every ten discharges, lowers voltage mean-squared error by an order of magnitude relative to the best static model while adapting its credible intervals as the cell approaches end-of-life. These results demonstrate that combining modest Bayesian augmentation with efficient update schemes turns conventional surrogates into uncertainty-aware, data-efficient, and computationally tractable DTs, paving the way for dependable models across industrial and scientific energy systems.


Gregorian melody, modality, and memory: Segmenting chant with Bayesian nonparametrics

arXiv.org Artificial Intelligence

The idea that Gregorian melodies are constructed from some vocabulary of segments has long been a part of chant scholarship. This so-called "centonisation" theory has received much musicological criticism, but frequent re-use of certain melodic segments has been observed in chant melodies, and the intractable number of possible segmentations allowed the option that some undiscovered segmentation exists that will yet prove the value of centonisation, and recent empirical results have shown that segmentations can outperform music-theoretical features in mode classification. Inspired by the fact that Gregorian chant was memorised, we search for an optimal unsupervised segmentation of chant melody using nested hierarchical Pitman-Yor language models. The segmentation we find achieves state-of-the-art performance in mode classification. Modeling a monk memorising the melodies from one liturgical manuscript, we then find empirical evidence for the link between mode classification and memory efficiency, and observe more formulaic areas at the beginnings and ends of melodies corresponding to the practical role of modality in performance. However, the resulting segmentations themselves indicate that even such a memory-optimal segmentation is not what is understood as centonisation.


Duality and Policy Evaluation in Distributionally Robust Bayesian Diffusion Control

arXiv.org Machine Learning

We consider a Bayesian diffusion control problem of expected terminal utility maximization. The controller imposes a prior distribution on the unknown drift of an underlying diffusion. The Bayesian optimal control, tracking the posterior distribution of the unknown drift, can be characterized explicitly. However, in practice, the prior will generally be incorrectly specified, and the degree of model misspecification can have a significant impact on policy performance. To mitigate this and reduce overpessimism, we introduce a distributionally robust Bayesian control (DRBC) formulation in which the controller plays a game against an adversary who selects a prior in divergence neighborhood of a baseline prior. The adversarial approach has been studied in economics and efficient algorithms have been proposed in static optimization settings. We develop a strong duality result for our DRBC formulation. Combining these results together with tools from stochastic analysis, we are able to derive a loss that can be efficiently trained (as we demonstrate in our numerical experiments) using a suitable neural network architecture. As a result, we obtain an effective algorithm for computing the DRBC optimal strategy. The methodology for computing the DRBC optimal strategy is greatly simplified, as we show, in the important case in which the adversary chooses a prior from a Kullback-Leibler distributional uncertainty set.


GANs Secretly Perform Approximate Bayesian Model Selection

arXiv.org Machine Learning

Generative Adversarial Networks (GANs) are popular and successful generative models. Despite their success, optimization is notoriously challenging and they require regularization against overfitting. In this work, we explain the success and limitations of GANs by interpreting them as probabilistic generative models. This interpretation enables us to view GANs as Bayesian neural networks with partial stochasticity, allowing us to establish conditions of universal approximation. We can then cast the adversarial-style optimization of several variants of GANs as the optimization of a proxy for the marginal likelihood. Taking advantage of the connection between marginal likelihood optimization and Occam's razor, we can define regularization and optimization strategies to smooth the loss landscape and search for solutions with minimum description length, which are associated with flat minima and good generalization. The results on a wide range of experiments indicate that these strategies lead to performance improvements and pave the way to a deeper understanding of regularization strategies for GANs.


Binned semiparametric Bayesian networks

arXiv.org Artificial Intelligence

This paper introduces a new type of probabilistic semiparametric model that takes advantage of data binning to reduce the computational cost of kernel density estimation in nonparametric distributions. Two new conditional probability distributions are developed for the new binned semiparametric Bayesian networks, the sparse binned kernel density estimation and the Fourier kernel density estimation. These two probability distributions address the curse of dimensionality, which typically impacts binned models, by using sparse tensors and restricting the number of parent nodes in conditional probability calculations. To evaluate the proposal, we perform a complexity analysis and conduct several comparative experiments using synthetic data and datasets from the UCI Machine Learning repository. The experiments include different binning rules, parent restrictions, grid sizes, and number of instances to get a holistic view of the model's behavior. As a result, our binned semiparametric Bayesian networks achieve structural learning and log-likelihood estimations with no statistically significant differences compared to the semiparametric Bayesian networks, but at a much higher speed. Thus, the new binned semiparametric Bayesian networks prove to be a reliable and more efficient alternative to their non-binned counterparts.


Plastic tensor networks for interpretable generative modeling

arXiv.org Artificial Intelligence

A structural optimization scheme for a single-layer nonnegative adaptive tensor tree (NATT) that models a target probability distribution is proposed as an alternative paradigm for generative modeling. The NATT scheme, by construction, automatically searches for a tree structure that best fits a given discrete dataset whose features serve as inputs, and has the advantage that it is interpretable as a probabilistic graphical model. We consider the NATT scheme and a recently proposed Born machine adaptive tensor tree (BMATT) optimization scheme and demonstrate their effectiveness on a variety of generative modeling tasks where the objective is to infer the hidden structure of a provided dataset. Our results show that in terms of minimizing the negative log-likelihood, the single-layer scheme has model performance comparable to the Born machine scheme, though not better. The tasks include deducing the structure of binary bitwise operations, learning the internal structure of random Bayesian networks given only visible sites, and a real-world example related to hierarchical clustering where a cladogram is constructed from mitochondrial DNA sequences. In doing so, we also show the importance of the choice of network topology and the versatility of a least-mutual information criterion in selecting a candidate structure for a tensor tree, as well as discuss aspects of these tensor tree generative models including their information content and interpretability.


Enhancing LLM Agent Safety via Causal Influence Prompting

arXiv.org Artificial Intelligence

As autonomous agents powered by large language models (LLMs) continue to demonstrate potential across various assistive tasks, ensuring their safe and reliable behavior is crucial for preventing unintended consequences. In this work, we introduce CIP, a novel technique that leverages causal influence diagrams (CIDs) to identify and mitigate risks arising from agent decision-making. CIDs provide a structured representation of cause-and-effect relationships, enabling agents to anticipate harmful outcomes and make safer decisions. Our approach consists of three key steps: (1) initializing a CID based on task specifications to outline the decision-making process, (2) guiding agent interactions with the environment using the CID, and (3) iteratively refining the CID based on observed behaviors and outcomes. Experimental results demonstrate that our method effectively enhances safety in both code execution and mobile device control tasks.


Quantum Approximate Optimization Algorithm for Spatiotemporal Forecasting of HIV Clusters

arXiv.org Artificial Intelligence

HIV epidemiological data is increasingly complex, requiring advanced computation for accurate cluster detection and forecasting. We employed quantum-accelerated machine learning to analyze HIV prevalence at the ZIP-code level using AIDSVu and synthetic SDoH data for 2022. Our approach compared classical clustering (DBSCAN, HDBSCAN) with a quantum approximate optimization algorithm (QAOA), developed a hybrid quantum-classical neural network for HIV prevalence forecasting, and used quantum Bayesian networks to explore causal links between SDoH factors and HIV incidence. The QAOA-based method achieved 92% accuracy in cluster detection within 1.6 seconds, outperforming classical algorithms. Meanwhile, the hybrid quantum-classical neural network predicted HIV prevalence with 94% accuracy, surpassing a purely classical counterpart. Quantum Bayesian analysis identified housing instability as a key driver of HIV cluster emergence and expansion, with stigma exerting a geographically variable influence. These quantum-enhanced methods deliver greater precision and efficiency in HIV surveillance while illuminating critical causal pathways. This work can guide targeted interventions, optimize resource allocation for PrEP, and address structural inequities fueling HIV transmission.


Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows

arXiv.org Artificial Intelligence

Autoregressive models have driven remarkable progress in language modeling. Their foundational reliance on discrete tokens, unidirectional context, and single-pass decoding, while central to their success, also inspires the exploration of a design space that could offer new axes of modeling flexibility. In this work, we explore an alternative paradigm, shifting language modeling from a discrete token space to a continuous latent space. We propose a novel framework TarFlowLM, that employs transformer-based autoregressive normalizing flows to model these continuous representations. This approach unlocks substantial flexibility, enabling the construction of models that can capture global bi-directional context through stacked, alternating-direction autoregressive transformations, support block-wise generation with flexible token patch sizes, and facilitate a hierarchical multi-pass generation process. We further propose new mixture-based coupling transformations designed to capture complex dependencies within the latent space shaped by discrete data, and demonstrate theoretical connections to conventional discrete autoregressive models. Extensive experiments on language modeling benchmarks demonstrate strong likelihood performance and highlight the flexible modeling capabilities inherent in our framework.