Bayesian Inference
Machine learning based digital twin for dynamical systems with multiple time-scales
Chakraborty, Souvik, Adhikari, Sondipon
Digital twin technology has a huge potential for widespread applications in different industrial sectors such as infrastructure, aerospace, and automotive. However, practical adoptions of this technology have been slower, mainly due to a lack of application-specific details. Here we focus on a digital twin framework for linear single-degree-of-freedom structural dynamic systems evolving in two different operational time scales in addition to its intrinsic dynamic time-scale. Our approach strategically separates into two components -- (a) a physics-based nominal model for data processing and response predictions, and (b) a data-driven machine learning model for the time-evolution of the system parameters. The physics-based nominal model is system-specific and selected based on the problem under consideration. On the other hand, the data-driven machine learning model is generic. For tracking the multi-scale evolution of the system parameters, we propose to exploit a mixture of experts as the data-driven model. Within the mixture of experts model, Gaussian Process (GP) is used as the expert model. The primary idea is to let each expert track the evolution of the system parameters at a single time-scale. For learning the hyperparameters of the `mixture of experts using GP', an efficient framework the exploits expectation-maximization and sequential Monte Carlo sampler is used. Performance of the digital twin is illustrated on a multi-timescale dynamical system with stiffness and/or mass variations. The digital twin is found to be robust and yields reasonably accurate results. One exciting feature of the proposed digital twin is its capability to provide reasonable predictions at future time-steps. Aspects related to the data quality and data quantity are also investigated.
Structure learning for CTBN's via penalized maximum likelihood methods
Shpak, Maryia, Miasojedow, Błażej, Rejchel, Wojciech
The continuous-time Bayesian networks (CTBNs) represent a class of stochastic processes, which can be used to model complex phenomena, for instance, they can describe interactions occurring in living processes, in social science models or in medicine. The literature on this topic is usually focused on the case when the dependence structure of a system is known and we are to determine conditional transition intensities (parameters of the network). In the paper, we study the structure learning problem, which is a more challenging task and the existing research on this topic is limited. The approach, which we propose, is based on a penalized likelihood method. We prove that our algorithm, under mild regularity conditions, recognizes the dependence structure of the graph with high probability. We also investigate the properties of the procedure in numerical studies to demonstrate its effectiveness.
Faster MCMC for Gaussian Latent Position Network Models
Spencer, Neil A., Junker, Brian, Sweet, Tracy M.
Latent position network models are a versatile tool in network science; applications include clustering entities, controlling for causal confounders, and defining priors over unobserved graphs. Estimating each node's latent position is typically framed as a Bayesian inference problem, with Metropolis within Gibbs being the most popular tool for approximating the posterior distribution. However, it is well-known that Metropolis within Gibbs is inefficient for large networks; the acceptance ratios are expensive to compute, and the resultant posterior draws are highly correlated. In this article, we propose an alternative Markov chain Monte Carlo strategy---defined using a combination of split Hamiltonian Monte Carlo and Firefly Monte Carlo---that leverages the posterior distribution's functional form for more efficient posterior computation. We demonstrate that these strategies outperform Metropolis within Gibbs and other algorithms on synthetic networks, as well as on real information-sharing networks of teachers and staff in a school district.
Horseshoe Prior Bayesian Quantile Regression
This paper extends the horseshoe prior of Carvalho et al. (2010) to the Bayesian quantile regression (HS-BQR) and provides a fast sampling algorithm that speeds up computation significantly in high dimensions. The performance of the HS-BQR is tested on large scale Monte Carlo simulations and an empirical application relevant to macroeoncomics. The Monte Carlo design considers several sparsity structures (sparse, dense, block) and error structures (i.i.d. errors and heteroskedastic errors). A number of LASSO based estimators (frequentist and Bayesian) are pitted against the HS-BQR to better gauge the performance of the method on the different designs. The HS-BQR yields just as good, or better performance than the other estimators considered when evaluated using coefficient bias and forecast error. We find that the HS-BQR is particularly potent in sparse designs and when estimating extreme quantiles. The simulations also highlight how the high dimensional quantile estimators fail to correctly identify the quantile function of the variables when both location and scale effects are present. In the empirical application, in which we evaluate forecast densities of US inflation, the HS-BQR provides well calibrated forecast densities whose individual quantiles, have the highest pseudo R squared, highlighting its potential for Value-at-Risk estimation.
Uncertainty Estimation with Infinitesimal Jackknife, Its Distribution and Mean-Field Approximation
Lu, Zhiyun, Ie, Eugene, Sha, Fei
Uncertainty quantification is an important research area in machine learning. Many approaches have been developed to improve the representation of uncertainty in deep models to avoid overconfident predictions. Existing ones such as Bayesian neural networks and ensemble methods require modifications to the training procedures and are computationally costly for both training and inference. Motivated by this, we propose mean-field infinitesimal jackknife (mfIJ) -- a simple, efficient, and general-purpose plug-in estimator for uncertainty estimation. The main idea is to use infinitesimal jackknife, a classical tool from statistics for uncertainty estimation to construct a pseudo-ensemble that can be described with a closed-form Gaussian distribution, without retraining. We then use this Gaussian distribution for uncertainty estimation. While the standard way is to sample models from this distribution and combine each sample's prediction, we develop a mean-field approximation to the inference where Gaussian random variables need to be integrated with the softmax nonlinear functions to generate probabilities for multinomial variables. The approach has many appealing properties: it functions as an ensemble without requiring multiple models, and it enables closed-form approximate inference using only the first and second moments of Gaussians. Empirically, mfIJ performs competitively when compared to state-of-the-art methods, including deep ensembles, temperature scaling, dropout and Bayesian NNs, on important uncertainty tasks. It especially outperforms many methods on out-of-distribution detection.
Compromise-free Bayesian neural networks
Javid, Kamran, Handley, Will, Hobson, Mike, Lasenby, Anthony
We conduct a thorough analysis of the relationship between the out-of-sample performance and the Bayesian evidence (marginal likelihood) of Bayesian neural networks (BNNs), as well as looking at the performance of ensembles of BNNs, both using the Boston housing dataset. Using the state-of-the-art in nested sampling, we numerically sample the full (non-Gaussian and multimodal) network posterior and obtain numerical estimates of the Bayesian evidence, considering network models with up to 156 trainable parameters. The networks have between zero and four hidden layers, either $\tanh$ or $ReLU$ activation functions, and with and without hierarchical priors. The ensembles of BNNs are obtained by determining the posterior distribution over networks, from the posterior samples of individual BNNs re-weighted by the associated Bayesian evidence values. There is good correlation between out-of-sample performance and evidence, as well as a remarkable symmetry between the evidence versus model size and out-of-sample performance versus model size planes. Networks with $ReLU$ activation functions have consistently higher evidences than those with $\tanh$ functions, and this is reflected in their out-of-sample performance. Ensembling over architectures acts to further improve performance relative to the individual BNNs.
Fast Maximum Likelihood Estimation and Supervised Classification for the Beta-Liouville Multinomial
Lakin, Steven Michael, Abdo, Zaid
The multinomial and related distributions have long been used to model categorical, count-based data in fields ranging from bioinformatics to natural language processing. Commonly utilized variants include the standard multinomial and the Dirichlet multinomial distributions due to their computational efficiency and straightforward parameter estimation process. However, these distributions make strict assumptions about the mean, variance, and covariance between the categorical features being modeled. If these assumptions are not met by the data, it may result in poor parameter estimates and loss in accuracy for downstream applications like classification. Here, we explore efficient parameter estimation and supervised classification methods using an alternative distribution, called the Beta-Liouville multinomial, which relaxes some of the multinomial assumptions. We show that the Beta-Liouville multinomial is comparable in efficiency to the Dirichlet multinomial for Newton-Raphson maximum likelihood estimation, and that its performance on simulated data matches or exceeds that of the multinomial and Dirichlet multinomial distributions. Finally, we demonstrate that the Beta-Liouville multinomial outperforms the multinomial and Dirichlet multinomial on two out of four gold standard datasets, supporting its use in modeling data with low to medium class overlap in a supervised classification context.
Detangling robustness in high dimensions: composite versus model-averaged estimation
Zhou, Jing, Claeskens, Gerda, Bradic, Jelena
Robust methods, though ubiquitous in practice, are yet to be fully understood in the context of regularized estimation and high dimensions. Even simple questions become challenging very quickly. For example, classical statistical theory identifies equivalence between model-averaged and composite quantile estimation. However, little to nothing is known about such equivalence between methods that encourage sparsity. This paper provides a toolbox to further study robustness in these settings and focuses on prediction. In particular, we study optimally weighted model-averaged as well as composite $l_1$-regularized estimation. Optimal weights are determined by minimizing the asymptotic mean squared error. This approach incorporates the effects of regularization, without the assumption of perfect selection, as is often used in practice. Such weights are then optimal for prediction quality. Through an extensive simulation study, we show that no single method systematically outperforms others. We find, however, that model-averaged and composite quantile estimators often outperform least-squares methods, even in the case of Gaussian model noise. Real data application witnesses the method's practical use through the reconstruction of compressed audio signals.
Online Bayesian Goal Inference for Boundedly-Rational Planning Agents
Zhi-Xuan, Tan, Mann, Jordyn L., Silver, Tom, Tenenbaum, Joshua B., Mansinghka, Vikash K.
People routinely infer the goals of others by observing their actions over time. Remarkably, we can do so even when those actions lead to failure, enabling us to assist others when we detect that they might not achieve their goals. How might we endow machines with similar capabilities? Here we present an architecture capable of inferring an agent's goals online from both optimal and non-optimal sequences of actions. Our architecture models agents as boundedly-rational planners that interleave search with execution by replanning, thereby accounting for sub-optimal behavior. These models are specified as probabilistic programs, allowing us to represent and perform efficient Bayesian inference over an agent's goals and internal planning processes. To perform such inference, we develop Sequential Inverse Plan Search (SIPS), a sequential Monte Carlo algorithm that exploits the online replanning assumption of these models, limiting computation by incrementally extending inferred plans as new actions are observed. We present experiments showing that this modeling and inference architecture outperforms Bayesian inverse reinforcement learning baselines, accurately inferring goals from both optimal and non-optimal trajectories involving failure and back-tracking, while generalizing across domains with compositional structure and sparse rewards.