Bayesian Inference
A deep latent variable model for semi-supervised multi-unit soft sensing in industrial processes
Grimstad, Bjarne, Lรธvland, Kristian, Imsland, Lars S., Gunnerud, Vidar
In many industrial processes, an apparent lack of data limits the development of data-driven soft sensors. There are, however, often opportunities to learn stronger models by being more data-efficient. To achieve this, one can leverage knowledge about the data from which the soft sensor is learned. Taking advantage of properties frequently possessed by industrial data, we introduce a deep latent variable model for semi-supervised multi-unit soft sensing. This hierarchical, generative model is able to jointly model different units, as well as learning from both labeled and unlabeled data. An empirical study of multi-unit soft sensing is conducted using two datasets: a synthetic dataset of single-phase fluid flow, and a large, real dataset of multi-phase flow in oil and gas wells. We show that by combining semi-supervised and multi-task learning, the proposed model achieves superior results, outperforming current leading methods for this soft sensing problem. We also show that when a model has been trained on a multi-unit dataset, it may be finetuned to previously unseen units using only a handful of data points. In this finetuning procedure, unlabeled data improve soft sensor performance; remarkably, this is true even when no labeled data are available.
Dimension-reduced Reconstruction Map Learning for Parameter Estimation in Likelihood-Free Inference Problems
Zhang, Rui, Chkrebtii, Oksana A., Xiu, Dongbin
Many application areas rely on models that can be readily simulated but lack a closed-form likelihood, or an accurate approximation under arbitrary parameter values. Existing parameter estimation approaches in this setting are generally approximate. Recent work on using neural network models to reconstruct the mapping from the data space to the parameters from a set of synthetic parameter-data pairs suffers from the curse of dimensionality, resulting in inaccurate estimation as the data size grows. We propose a dimension-reduced approach to likelihood-free estimation which combines the ideas of reconstruction map estimation with dimension-reduction approaches based on subject-specific knowledge. We examine the properties of reconstruction map estimation with and without dimension reduction and explore the trade-off between approximation error due to information loss from reducing the data dimension and approximation error. Numerical examples show that the proposed approach compares favorably with reconstruction map estimation, approximate Bayesian computation, and synthetic likelihood estimation.
Byzantine-tolerant distributed learning of finite mixture models
This paper proposes two split-and-conquer (SC) learning estimators for finite mixture models that are tolerant to Byzantine failures. In SC learning, individual machines obtain local estimates, which are then transmitted to a central server for aggregation. During this communication, the server may receive malicious or incorrect information from some local machines, a scenario known as Byzantine failures. While SC learning approaches have been devised to mitigate Byzantine failures in statistical models with Euclidean parameters, developing Byzantine-tolerant methods for finite mixture models with non-Euclidean parameters requires a distinct strategy. Our proposed distance-based methods are hyperparameter tuning free, unlike existing methods, and are resilient to Byzantine failures while achieving high statistical efficiency. We validate the effectiveness of our methods both theoretically and empirically via experiments on simulated and real data from machine learning applications for digit recognition. The code for the experiment can be found at https://github.com/SarahQiong/RobustSCGMM.
Understanding Reference Policies in Direct Preference Optimization
Liu, Yixin, Liu, Pengfei, Cohan, Arman
Direct Preference Optimization (DPO) has become a widely used training method for the instruction fine-tuning of large language models (LLMs). In this work, we explore an under-investigated aspect of DPO - its dependency on the reference model or policy. Such reference policies, typically instantiated as the model to be further fine-tuned, are important since they can impose an upper limit on DPO's effectiveness. Therefore, we address three related research questions in this work. First, we explore the optimal strength of the KL-divergence constraint in DPO, which penalizes deviations from the reference policy, and find that DPO is sensitive to this strength. Next, we examine the necessity of reference policies for instruction fine-tuning by providing both theoretical and empirical comparisons between DPO and related learning objectives, demonstrating DPO's superiority. Additionally, we investigate whether DPO benefits from stronger reference policies, finding that a stronger reference policy can lead to improved performance, but only when it is similar to the model being fine-tuned. Our findings highlight the confounding role of reference policies in DPO and offer insights for best practices, while also identifying open research questions for future studies.
Deterministic Trajectory Optimization through Probabilistic Optimal Control
Filabadi, Mohammad Mahmoudi, Lefebvre, Tom, Crevecoeur, Guillaume
This article proposes two new algorithms tailored to discrete-time deterministic finite-horizon nonlinear optimal control problems or so-called trajectory optimization problems. Both algorithms are inspired by a novel theoretical paradigm known as probabilistic optimal control, that reformulates optimal control as an equivalent probabilistic inference problem. This perspective allows to address the problem using the Expectation-Maximization algorithm. We show that the application of this algorithm results in a fixed point iteration of probabilistic policies that converge to the deterministic optimal policy. Two strategies for policy evaluation are discussed, using state-of-the-art uncertainty quantification methods resulting into two distinct algorithms. The algorithms are structurally closest related to the differential dynamic programming algorithm and related methods that use sigma-point methods to avoid direct gradient evaluations. The main advantage of our work is an improved balance between exploration and exploitation over the iterations, leading to improved numerical stability and accelerated convergence. These properties are demonstrated on different nonlinear systems.
Discovering governing equation in structural dynamics from acceleration-only measurements
Alvares, Calvin, Chakraborty, Souvik
Over the past few years, equation discovery has gained popularity in different fields of science and engineering. However, existing equation discovery algorithms rely on the availability of noisy measurements of the state variables (i.e., displacement {and velocity}). This is a major bottleneck in structural dynamics, where we often only have access to acceleration measurements. To that end, this paper introduces a novel equation discovery algorithm for discovering governing equations of dynamical systems from acceleration-only measurements. The proposed algorithm employs a library-based approach for equation discovery. To enable equation discovery from acceleration-only measurements, we propose a novel Approximate Bayesian Computation (ABC) model that prioritizes parsimonious models. The efficacy of the proposed algorithm is illustrated using {four} structural dynamics examples that include both linear and nonlinear dynamical systems. The case studies presented illustrate the possible application of the proposed approach for equation discovery of dynamical systems from acceleration-only measurements.
Scalable Monte Carlo for Bayesian Learning
Fearnhead, Paul, Nemeth, Christopher, Oates, Chris J., Sherlock, Chris
This book aims to provide a graduate-level introduction to advanced topics in Markov chain Monte Carlo (MCMC) algorithms, as applied broadly in the Bayesian computational context. Most, if not all of these topics (stochastic gradient MCMC, non-reversible MCMC, continuous time MCMC, and new techniques for convergence assessment) have emerged as recently as the last decade, and have driven substantial recent practical and theoretical advances in the field. A particular focus is on methods that are scalable with respect to either the amount of data, or the data dimension, motivated by the emerging high-priority application areas in machine learning and AI.
On the Calibration of Epistemic Uncertainty: Principles, Paradoxes and Conflictual Loss
Fellaji, Mohammed, Pennerath, Frรฉdรฉric, Conan-Guez, Brieuc, Couceiro, Miguel
The calibration of predictive distributions has been widely studied in deep learning, but the same cannot be said about the more specific epistemic uncertainty as produced by Deep Ensembles, Bayesian Deep Networks, or Evidential Deep Networks. Although measurable, this form of uncertainty is difficult to calibrate on an objective basis as it depends on the prior for which a variety of choices exist. Nevertheless, epistemic uncertainty must in all cases satisfy two formal requirements: firstly, it must decrease when the training dataset gets larger and, secondly, it must increase when the model expressiveness grows. Despite these expectations, our experimental study shows that on several reference datasets and models, measures of epistemic uncertainty violate these requirements, sometimes presenting trends completely opposite to those expected. These paradoxes between expectation and reality raise the question of the true utility of epistemic uncertainty as estimated by these models. A formal argument suggests that this disagreement is due to a poor approximation of the posterior distribution rather than to a flaw in the measure itself. Based on this observation, we propose a regularization function for deep ensembles, called conflictual loss in line with the above requirements. We emphasize its strengths by showing experimentally that it fulfills both requirements of epistemic uncertainty, without sacrificing either the performance nor the calibration of the deep ensembles.
Ensemble Transport Filter via Optimized Maximum Mean Discrepancy
In this paper, we present a new ensemble-based filter method by reconstructing the analysis step of the particle filter through a transport map, which directly transports prior particles to posterior particles. The transport map is constructed through an optimization problem described by the Maximum Mean Discrepancy loss function, which matches the expectation information of the approximated posterior and reference posterior. The proposed method inherits the accurate estimation of the posterior distribution from particle filtering. To improve the robustness of Maximum Mean Discrepancy, a variance penalty term is used to guide the optimization. It prioritizes minimizing the discrepancy between the expectations of highly informative statistics for the approximated and reference posteriors. The penalty term significantly enhances the robustness of the proposed method and leads to a better approximation of the posterior. A few numerical examples are presented to illustrate the advantage of the proposed method over the ensemble Kalman filter.
Walking the Values in Bayesian Inverse Reinforcement Learning
Bajgar, Ondrej, Abate, Alessandro, Gatsis, Konstantinos, Osborne, Michael A.
The goal of Bayesian inverse reinforcement learning (IRL) is recovering a posterior distribution over reward functions using a set of demonstrations from an expert optimizing for a reward unknown to the learner. The resulting posterior over rewards can then be used to synthesize an apprentice policy that performs well on the same or a similar task. A key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood, often defined in terms of Q values: vanilla Bayesian IRL needs to solve the costly forward planning problem - going from rewards to the Q values - at every step of the algorithm, which may need to be done thousands of times. We propose to solve this by a simple change: instead of focusing on primarily sampling in the space of rewards, we can focus on primarily working in the space of Q-values, since the computation required to go from Q-values to reward is radically cheaper. Furthermore, this reversion of the computation makes it easy to compute the gradient allowing efficient sampling using Hamiltonian Monte Carlo. We propose ValueWalk - a new Markov chain Monte Carlo method based on this insight - and illustrate its advantages on several tasks.