Goto

Collaborating Authors

 bayesian learning


Bayesian Learning via Q-Exponential Process

Neural Information Processing Systems

Regularization is one of the most fundamental topics in optimization, statistics and machine learning. To get sparsity in estimating a parameter $u\in\mathbb{R}^d$, an $\ell_q$ penalty term, $\Vert u\Vert_q$, is usually added to the objective function. What is the probabilistic distribution corresponding to such $\ell_q$ penalty?


Bayesian Learning of Sum-Product Networks

Neural Information Processing Systems

Sum-product networks (SPNs) are flexible density estimators and have received significant attention due to their attractive inference properties. While parameter learning in SPNs is well developed, structure learning leaves something to be desired: Even though there is a plethora of SPN structure learners, most of them are somewhat ad-hoc and based on intuition rather than a clear learning principle. In this paper, we introduce a well-principled Bayesian framework for SPN structure learning.


Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

Neural Information Processing Systems

In this paper, we present a new practical method for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise. This is achieved by simulating a collection of replicas in parallel with different temperatures and periodically swapping them. When evolving the replicas' states, the Nos\'e-Hoover dynamics is applied, which adaptively neutralizes the mini-batch noise. To perform proper exchanges, a new protocol is developed with a noise-aware test of acceptance, by which the detailed balance is reserved in an asymptotic way. While its efficacy on complex multimodal posteriors has been illustrated by testing over synthetic distributions, experiments with deep Bayesian neural networks on large-scale datasets have shown its significant improvements over strong baselines.


Decentralized Langevin Dynamics for Bayesian Learning

Neural Information Processing Systems

Motivated by decentralized approaches to machine learning, we propose a collaborative Bayesian learning algorithm taking the form of decentralized Langevin dynamics in a non-convex setting. Our analysis show that the initial KL-divergence between the Markov Chain and the target posterior distribution is exponentially decreasing while the error contributions to the overall KL-divergence from the additive noise is decreasing in polynomial time. We further show that the polynomial-term experiences speed-up with number of agents and provide sufficient conditions on the time-varying step-sizes to guarantee convergence to the desired distribution. The performance of the proposed algorithm is evaluated on a wide variety of machine learning tasks. The empirical results show that the performance of individual agents with locally available data is on par with the centralized setting with considerable improvement in the convergence rate.


Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

Neural Information Processing Systems

Models of many real-life applications, such as queueing models of communication networks or computing systems, have a countably infinite state-space. Algorithmic and learning procedures that have been developed to produce optimal policies mainly focus on finite state settings, and do not directly apply to these models. To overcome this lacuna, in this work we study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes (MDPs) governed by an unknown parameter $\theta\in\Theta$, and defined on a countably-infinite state-space $\mathcal X=\mathbb{Z}_+^d$, with finite action space $\mathcal A$, and an unbounded cost function. We take a Bayesian perspective with the random unknown parameter $\boldsymbol{\theta}^*$ generated via a given fixed prior distribution on $\Theta$. To optimally control the unknown MDP, we propose an algorithm based on Thompson sampling with dynamically-sized episodes: at the beginning of each episode, the posterior distribution formed via Bayes' rule is used to produce a parameter estimate, which then decides the policy applied during the episode. To ensure the stability of the Markov chain obtained by following the policy chosen for each parameter, we impose ergodicity assumptions. From this condition and using the solution of the average cost Bellman equation, we establish an $\tilde O(dh^d\sqrt{|\mathcal A|T})$ upper bound on the Bayesian regret of our algorithm, where $T$ is the time-horizon. Finally, to elucidate the applicability of our algorithm, we consider two different queueing models with unknown dynamics, and show that our algorithm can be applied to develop approximately optimal control algorithms.


RAPTOR-GEN: RApid PosTeriOR GENerator for Bayesian Learning in Biomanufacturing

Xu, Wandi, Xie, Wei

arXiv.org Machine Learning

Biopharmaceutical manufacturing is vital to public health but lacks the agility for rapid, on-demand production of biotherapeutics due to the complexity and variability of bioprocesses. T o overcome this, we introduce RApid PosT eriOR GENerator (RAPTOR-GEN), a mechanism-informed Bayesian learning framework designed to accelerate intelligent digital twin development from sparse and heterogeneous experimental data. This framework is built on a multi-scale probabilistic knowledge graph (pKG), formulated as a stochastic differential equation (SDE)-based foundational model that captures the nonlinear dynamics of bioprocesses. RAPTOR-GEN consists of two ingredients: (i) an interpretable metamodel integrating linear noise approximation (LNA) that exploits the structural information of bioprocessing mechanisms and a sequential learning strategy to fuse heterogeneous and sparse data, enabling inference of latent state variables and explicit approximation of the intractable likelihood function; and (ii) an efficient Bayesian posterior sampling method that utilizes Langevin diffusion (LD) to accelerate posterior exploration by exploiting the gradients of the derived likelihood. It generalizes the LNA approach to circumvent the challenge of step size selection, facilitating robust learning of mechanistic parameters with provable finite-sample performance guarantees. We develop a fast and robust RAPTOR-GEN algorithm with controllable error. Numerical experiments demonstrate its effectiveness in uncovering the underlying regulatory mechanisms of biomanufacturing processes. Funding: This research was supported by National Science Foundation Grant CAREER CMMI-2442970 and National Institute of Standards and T echnology Grant 70NANB21H086.


Review for NeurIPS paper: Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

Neural Information Processing Systems

Summary and Contributions: The paper considers the problem of sampling from the posterior distribution in Bayesian inference. To be more precise, the paper approaches the question of stochastic sampling that relies only on minibatches of data at each iteration. To achieve rapid mixing between isolated modes, the authors consider parallel tempered chains and introduce replica-exchange steps into the stochastic Nose-Hoover Dynamics. The crux of this approach is the stochastic test for the replica-exchange step. To develop such a test, the authors follow the paper [An efficient minibatch acceptance test for metropolis-hastings], which introduces the concept of correction distribution.


Review for NeurIPS paper: Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

Neural Information Processing Systems

The paper proposes a novel MCMC-type algorithm to perform Bayesian inference on large datasets. The paper is a mixture of replica exchange, Nose-Hoover dynamics and non-standard acceptance criterion to deal with mini-batches. All the reviewers participated actively to the discussion after the rebuttal was made available. Although all the ingredients of the proposed method do exist, their combination is original and potentially useful for the ML literature as pointed out by most reviewers. Theorem 2 is also neat and proposes a nice way to propose swaps between replicas using mini-batches.


Review for NeurIPS paper: Decentralized Langevin Dynamics for Bayesian Learning

Neural Information Processing Systems

I think the distributed setting and all the subtleties that come with it should have been explored better. I was left wondering about the communication costs of an architecture needed for this to work, and potential issues with that. One obvious question would be, how would the iterate updates at each node be impacted by random noise injected into the w_{js} being passed around in the comms channels and/or random drops / missed updates. The only difference between the convergence discussion in \S4.1 and other works in the literature that use similar machinery seems to be the formulations for the extra constants/iterate weights in the distributed setting. This reduces the novelty/significance of that section somewhat, in my opinion.


Review for NeurIPS paper: Decentralized Langevin Dynamics for Bayesian Learning

Neural Information Processing Systems

The paper adresses the important problem of Bayesian inference in a distributed setting, via a decentralized Langevin algorithm. Although the method is a natural extension of existing algorithms, its simplicity is an advantage, and the theoretical analysis is nontrivial. After considering the author's response, all reviewers agreed that the paper will make a nice contribution to Neurips.