Bayesian Inference
Regularized EM algorithm
Houdouin, Pierre, Ollila, Esa, Pascal, Frederic
Expectation-Maximization (EM) algorithm is a widely used iterative algorithm for computing (local) maximum likelihood estimate (MLE). It can be used in an extensive range of problems, including the clustering of data based on the Gaussian mixture model (GMM). Numerical instability and convergence problems may arise in situations where the sample size is not much larger than the data dimensionality. In such low sample support (LSS) settings, the covariance matrix update in the EM-GMM algorithm may become singular or poorly conditioned, causing the algorithm to crash. On the other hand, in many signal processing problems, a priori information can be available indicating certain structures for different cluster covariance matrices. In this paper, we present a regularized EM algorithm for GMM-s that can make efficient use of such prior knowledge as well as cope with LSS situations. The method aims to maximize a penalized GMM likelihood where regularized estimation may be used to ensure positive definiteness of covariance matrix updates and shrink the estimators towards some structured target covariance matrices. We show that the theoretical guarantees of convergence hold, leading to better performing EM algorithm for structured covariance matrix models or with low sample settings.
An active inference model of car following: Advantages and applications
Wei, Ran, McDonald, Anthony D., Garcia, Alfredo, Markkula, Gustav, Engstrom, Johan, O'Kelly, Matthew
Driver process models play a central role in the testing, verification, and development of automated and autonomous vehicle technologies. Prior models developed from control theory and physics-based rules are limited in automated vehicle applications due to their restricted behavioral repertoire. Data-driven machine learning models are more capable than rule-based models but are limited by the need for large training datasets and their lack of interpretability, i.e., an understandable link between input data and output behaviors. We propose a novel car following modeling approach using active inference, which has comparable behavioral flexibility to data-driven models while maintaining interpretability. We assessed the proposed model, the Active Inference Driving Agent (AIDA), through a benchmark analysis against the rule-based Intelligent Driver Model, and two neural network Behavior Cloning models. The models were trained and tested on a real-world driving dataset using a consistent process. The testing results showed that the AIDA predicted driving controls significantly better than the rule-based Intelligent Driver Model and had similar accuracy to the data-driven neural network models in three out of four evaluations. Subsequent interpretability analyses illustrated that the AIDA's learned distributions were consistent with driver behavior theory and that visualizations of the distributions could be used to directly comprehend the model's decision making process and correct model errors attributable to limited training data. The results indicate that the AIDA is a promising alternative to black-box data-driven models and suggest a need for further research focused on modeling driving style and model training with more diverse datasets.
Modelling Determinants of Cryptocurrency Prices: A Bayesian Network Approach
Amirzadeh, Rasoul, Nazari, Asef, Thiruvady, Dhananjay, Ee, Mong Shan
The growth of market capitalisation and the number of altcoins (cryptocurrencies other than Bitcoin) provide investment opportunities and complicate the prediction of their price movements. A significant challenge in this volatile and relatively immature market is the problem of predicting cryptocurrency prices which needs to identify the factors influencing these prices. The focus of this study is to investigate the factors influencing altcoin prices, and these factors have been investigated from a causal analysis perspective using Bayesian networks. In particular, studying the nature of interactions between five leading altcoins, traditional financial assets including gold, oil, and S\&P 500, and social media is the research question. To provide an answer to the question, we create causal networks which are built from the historic price data of five traditional financial assets, social media data, and price data of altcoins. The ensuing networks are used for causal reasoning and diagnosis, and the results indicate that social media (in particular Twitter data in this study) is the most significant influencing factor of the prices of altcoins. Furthermore, it is not possible to generalise the coins' reactions against the changes in the factors. Consequently, the coins need to be studied separately for a particular price movement investigation.
Stochastic Model Predictive Control Utilizing Bayesian Neural Networks
Pohlodek, J., Alsmeier, H., Morabito, B., Schlauch, C., Savchenko, A., Findeisen, R.
Integrating measurements and historical data can enhance control systems through learning-based techniques, but ensuring performance and safety is challenging. Robust model predictive control strategies, like stochastic model predictive control, can address this by accounting for uncertainty. Gaussian processes are often used but have limitations with larger models and data sets. We explore Bayesian neural networks for stochastic learning-assisted control, comparing their performance to Gaussian processes on a wastewater treatment plant model. Results show Bayesian neural networks achieve similar performance, highlighting their potential as an alternative for control designs, particularly when handling extensive data sets.
Multi-agent Black-box Optimization using a Bayesian Approach to Alternating Direction Method of Multipliers
Krishnamoorthy, Dinesh, Paulson, Joel A.
Bayesian optimization (BO) is a powerful black-box optimization framework that looks to efficiently learn the global optimum of an unknown system by systematically trading-off between exploration and exploitation. However, the use of BO as a tool for coordinated decision-making in multi-agent systems with unknown structure has not been widely studied. This paper investigates a black-box optimization problem over a multi-agent network coupled via shared variables or constraints, where each subproblem is formulated as a BO that uses only its local data. The proposed multi-agent BO (MABO) framework adds a penalty term to traditional BO acquisition functions to account for coupling between the subsystems without data sharing. We derive a suitable form for this penalty term using alternating directions method of multipliers (ADMM), which enables the local decision-making problems to be solved in parallel (and potentially asynchronously). The effectiveness of the proposed MABO method is demonstrated on an intelligent transport system for fuel efficient vehicle platooning.
Autoregressive Conditional Neural Processes
Bruinsma, Wessel P., Markou, Stratis, Requiema, James, Foong, Andrew Y. K., Andersson, Tom R., Vaughan, Anna, Buonomo, Anthony, Hosking, J. Scott, Turner, Richard E.
Conditional neural processes (CNPs; Garnelo et al., 2018a) are attractive meta-learning models which produce well-calibrated predictions and are trainable via a simple maximum likelihood procedure. Although CNPs have many advantages, they are unable to model dependencies in their predictions. Various works propose solutions to this, but these come at the cost of either requiring approximate inference or being limited to Gaussian predictions. In this work, we instead propose to change how CNPs are deployed at test time, without any modifications to the model or training procedure. Instead of making predictions independently for every target point, we autoregressively define a joint predictive distribution using the chain rule of probability, taking inspiration from the neural autoregressive density estimator (NADE) literature. We show that this simple procedure allows factorised Gaussian CNPs to model highly dependent, non-Gaussian predictive distributions. Perhaps surprisingly, in an extensive range of tasks with synthetic and real data, we show that CNPs in autoregressive (AR) mode not only significantly outperform non-AR CNPs, but are also competitive with more sophisticated models that are significantly more computationally expensive and challenging to train. This performance is remarkable given that AR CNPs are not trained to model joint dependencies. Our work provides an example of how ideas from neural distribution estimation can benefit neural processes, and motivates research into the AR deployment of other neural process models.
clusterBMA: Bayesian model averaging for clustering
Forbes, Owen, Santos-Fernandez, Edgar, Wu, Paul Pao-Yen, Xie, Hong-Bo, Schwenn, Paul E., Lagopoulos, Jim, Mills, Lia, Sacks, Dashiell D., Hermens, Daniel F., Mengersen, Kerrie
Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and parameters chosen. Bayesian model averaging (BMA) is a popular approach for combining results across multiple models that offers some attractive benefits in this setting, including probabilistic interpretation of the combined cluster structure and quantification of model-based uncertainty. In this work we introduce clusterBMA, a method that enables weighted model averaging across results from multiple unsupervised clustering algorithms. We use clustering internal validation criteria to develop an approximation of the posterior model probability, used for weighting the results from each model. From a consensus matrix representing a weighted average of the clustering solutions across models, we apply symmetric simplex matrix factorisation to calculate final probabilistic cluster allocations. In addition to outperforming other ensemble clustering methods on simulated data, clusterBMA offers unique features including probabilistic allocation to averaged clusters, combining allocation probabilities from 'hard' and 'soft' clustering algorithms, and measuring model-based uncertainty in averaged cluster allocation. This method is implemented in an accompanying R package of the same name.
Applications of Gaussian Processes at Extreme Lengthscales: From Molecules to Black Holes
In many areas of the observational and experimental sciences data is scarce. Data observation in high-energy astrophysics is disrupted by celestial occlusions and limited telescope time while data derived from laboratory experiments in synthetic chemistry and materials science is time and cost-intensive to collect. On the other hand, knowledge about the data-generation mechanism is often available in the sciences, such as the measurement error of a piece of laboratory apparatus. Both characteristics, small data and knowledge of the underlying physics, make Gaussian processes (GPs) ideal candidates for fitting such datasets. GPs can make predictions with consideration of uncertainty, for example in the virtual screening of molecules and materials, and can also make inferences about incomplete data such as the latent emission signature from a black hole accretion disc. Furthermore, GPs are currently the workhorse model for Bayesian optimisation, a methodology foreseen to be a guide for laboratory experiments in scientific discovery campaigns. The first contribution of this thesis is to use GP modelling to reason about the latent emission signature from the Seyfert galaxy Markarian 335, and by extension, to reason about the applicability of various theoretical models of black hole accretion discs. The second contribution is to extend the GP framework to molecular and chemical reaction representations and to provide an open-source software library to enable the framework to be used by scientists. The third contribution is to leverage GPs to discover novel and performant photoswitch molecules. The fourth contribution is to introduce a Bayesian optimisation scheme capable of modelling aleatoric uncertainty to facilitate the identification of material compositions that possess intrinsic robustness to large scale fabrication processes.
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems
Mete, Akshay, Singh, Rahul, Kumar, P. R.
We consider the problem of controlling an unknown stochastic linear system with quadratic costs - called the adaptive LQ control problem. We re-examine an approach called ''Reward Biased Maximum Likelihood Estimate'' (RBMLE) that was proposed more than forty years ago, and which predates the ''Upper Confidence Bound'' (UCB) method as well as the definition of ''regret'' for bandit problems. It simply added a term favoring parameters with larger rewards to the criterion for parameter estimation. We show how the RBMLE and UCB methods can be reconciled, and thereby propose an Augmented RBMLE-UCB algorithm that combines the penalty of the RBMLE method with the constraints of the UCB method, uniting the two approaches to optimism in the face of uncertainty. We establish that theoretically, this method retains $\Tilde{\mathcal{O}}(\sqrt{T})$ regret, the best-known so far. We further compare the empirical performance of the proposed Augmented RBMLE-UCB and the standard RBMLE (without the augmentation) with UCB, Thompson Sampling, Input Perturbation, Randomized Certainty Equivalence and StabL on many real-world examples including flight control of Boeing 747 and Unmanned Aerial Vehicle. We perform extensive simulation studies showing that the Augmented RBMLE consistently outperforms UCB, Thompson Sampling and StabL by a huge margin, while it is marginally better than Input Perturbation and moderately better than Randomized Certainty Equivalence.
Master's Thesis: Out-of-distribution Detection with Energy-based Models
Today, deep learning is increasingly applied in security-critical situations such as autonomous driving and medical diagnosis. Despite its success, the behavior and robustness of deep networks are not fully understood yet, posing a significant risk. In particular, researchers recently found that neural networks are overly confident in their predictions, even on data they have never seen before. To tackle this issue, one can differentiate two approaches in the literature. One accounts for uncertainty in the predictions, while the second estimates the underlying density of the training data to decide whether a given input is close to the training data, and thus the network is able to perform as expected.In this thesis, we investigate the capabilities of EBMs at the task of fitting the training data distribution to perform detection of out-of-distribution (OOD) inputs. We find that on most datasets, EBMs do not inherently outperform other density estimators at detecting OOD data despite their flexibility. Thus, we additionally investigate the effects of supervision, dimensionality reduction, and architectural modifications on the performance of EBMs. Further, we propose Energy-Prior Network (EPN) which enables estimation of various uncertainties within an EBM for classification, bridging the gap between two approaches for tackling the OOD detection problem. We identify a connection between the concentration parameters of the Dirichlet distribution and the joint energy in an EBM. Additionally, this allows optimization without a held-out OOD dataset, which might not be available or costly to collect in some applications. Finally, we empirically demonstrate that Energy-Prior Network (EPN) is able to detect OOD inputs, datasets shifts, and adversarial examples. Theoretically, EPN offers favorable properties for the asymptotic case when inputs are far from the training data.