Bayesian Learning
Conditional deep surrogate models for stochastic, high-dimensional, and multi-fidelity systems
We present a probabilistic deep learning methodology that enables the construction of predictive data-driven surrogates for stochastic systems. Leveraging recent advances in variational inference with implicit distributions, we put forth a statistical inference framework that enables the end-to-end training of surrogate models on paired input-output observations that may be stochastic in nature, originate from different information sources of variable fidelity, or be corrupted by complex noise processes. The resulting surrogates can accommodate high-dimensional inputs and outputs and are able to return predictions with quantified uncertainty. The effectiveness our approach is demonstrated through a series of canonical studies, including the regression of noisy data, multi-fidelity modeling of stochastic processes, and uncertainty propagation in high-dimensional dynamical systems.
Large-Scale Joint Topic, Sentiment & User Preference Analysis for Online Reviews
Yu, Xinli, Chen, Zheng, Yang, Wei-Shih, Hu, Xiaohua, Yan, Erjia
This paper presents a non-trivial reconstruction of a previous joint topic-sentiment-preference review model TSPRA with stick-breaking representation under the framework of variational inference (VI) and stochastic variational inference (SVI). TSPRA is a Gibbs Sampling based model that solves topics, word sentiments and user preferences altogether and has been shown to achieve good performance, but for large data set it can only learn from a relatively small sample. We develop the variational models vTSPRA and svTSPRA to improve the time use, and our new approach is capable of processing millions of reviews. We rebuild the generative process, improve the rating regression, solve and present the coordinate-ascent updates of variational parameters, and show the time complexity of each iteration is theoretically linear to the corpus size, and the experiments on Amazon data sets show it converges faster than TSPRA and attains better results given the same amount of time. In addition, we tune svTSPRA into an online algorithm ovTSPRA that can monitor oscillations of sentiment and preference overtime. Some interesting fluctuations are captured and possible explanations are provided. The results give strong visual evidence that user preference is better treated as an independent factor from sentiment.
A Modern Retrospective on Probabilistic Numerics
The field of probabilistic numerics (PN), loosely speaking, attempts to provide a statistical treatment of the errors and/or approximations that are made en route to the output of a deterministic numerical method, e.g. the approximation of an integral by quadrature, or the discretised solution of an ordinary or partial differential equation. This decade has seen a surge of activity in this field. In comparison with historical developments that can be traced back over more than a hundred years, the most recent developments are particularly interesting because they have been characterised by simultaneous input from multiple scientific disciplines: mathematics, statistics, machine learning, and computer science. The field has, therefore, advanced on a broad front, with contributions ranging from the building of overarching generaltheory to practical implementations in specific problems of interest. Over the same period of time, and because of increased interaction among researchers coming from different communities, the extent to which these developments were -- or were not -- presaged by twentieth-century researchers has also come to be better appreciated. Thus, the time appears to be ripe for an update of the 2014 Tübingen Manifesto on probabilistic numerics[Hennig, 2014, Osborne, 2014d,c,b,a] and the position paper[Hennig et al., 2015] to take account of the developments between 2014 and 2019, an improved awareness of the history of this field, and a clearer sense of its future directions. In this article, we aim to summarise some of the history of probabilistic perspectives on numerics (Section 2), to place more recent developments into context (Section 3), and to articulate a vision for future research in, and use of, probabilistic numerics (Section 4).
An introduction to domain adaptation and transfer learning
In machine learning, if the training data is an unbiased sample of an underlying distribution, then the learned classification function will make accurate predictions for new samples. However, if the training data is not an unbiased sample, then there will be differences between how the training data is distributed and how the test data is distributed. Standard classifiers cannot cope with changes in data distributions between training and test phases, and will not perform well. Domain adaptation and transfer learning are sub-fields within machine learning that are concerned with accounting for these types of changes. Here, we present an introduction to these fields, guided by the question: when and how can a classifier generalize from a source to a target domain? We will start with a brief introduction into risk minimization, and how transfer learning and domain adaptation expand upon this framework. Following that, we discuss three special cases of data set shift, namely prior, covariate and concept shift. For more complex domain shifts, there are a wide variety of approaches. These are categorized into: importance-weighting, subspace mapping, domain-invariant spaces, feature augmentation, minimax estimators and robust algorithms. A number of points will arise, which we will discuss in the last section. We conclude with the remark that many open questions will have to be addressed before transfer learners and domain-adaptive classifiers become practical.
GPdoemd: a Python package for design of experiments for model discrimination
Olofsson, Simon, Hebing, Lukas, Niedenführ, Sebastian, Deisenroth, Marc Peter, Misener, Ruth
Model discrimination identifies a mathematical model that usefully explains and predicts a given system's behaviour. Researchers will often have several models, i.e.\ hypotheses, about an underlying system mechanism, but insufficient experimental data to discriminate between the models, i.e.\ discard inaccurate models. Given rival mathematical models and an initial experimental data set, optimal design of experiments suggests maximally informative experimental observations that maximise a design criterion weighted by prediction uncertainty. The model uncertainty requires gradients, which may not be readily available for black-box models. This paper (i) proposes a new design criterion using the Jensen-R\'enyi divergence, and (ii) develops a novel method replacing black-box models with Gaussian process surrogates. Using the surrogates, we marginalise out the model parameters with approximate inference. Results show these contributions working well for both classical and new test instances. We also (iii) introduce and discuss GPdoemd, the open-source implementation of the Gaussian process surrogate method.
A Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation
Yousefi, Sahar, Shalmani, M. T. Manzuri, Chan, Antoni B.
Generative dynamic texture models (GDTMs) are widely used for dynamic texture (DT) segmentation in the video sequences. GDTMs represent DTs as a set of linear dynamical systems (LDSs). A major limitation of these models concerns the automatic selection of a proper number of DTs. Dirichlet process mixture (DPM) models which have appeared recently as the cornerstone of the non-parametric Bayesian statistics, is an optimistic candidate toward resolving this issue. Under this motivation to resolve the aforementioned drawback, we propose a novel non-parametric fully Bayesian approach for DT segmentation, formulated on the basis of a joint DPM and GDTM construction. This interaction causes the algorithm to overcome the problem of automatic segmentation properly. We derive the Variational Bayesian Expectation-Maximization (VBEM) inference for the proposed model. Moreover, in the E-step of inference, we apply Rauch-Tung-Striebel smoother (RTSS) algorithm on Variational Bayesian LDSs. Ultimately, experiments on different video sequences are performed. Experiment results indicate that the proposed algorithm outperforms the previous methods in efficiency and accuracy noticeably.
Input Prioritization for Testing Neural Networks
Byun, Taejoon, Sharma, Vaibhav, Vijayakumar, Abhishek, Rayadurgam, Sanjai, Cofer, Darren
Abstract--Deep neural networks (DNNs) are increasingly being adopted for sensing and control functions in a variety of safety and mission-critical systems such as self-driving cars, autonomous air vehicles, medical diagnostics and industrial robotics. Failures of such systems can lead to loss of life or property, which necessitates stringent verification and validation for providing high assurance. Though formal verification approaches are being investigated, testing remains the primary technique for assessing the dependability of such systems. Due to the nature of the tasks handled by DNNs, the cost of obtaining test oracle data--the expected output, a.k.a. Thus, prioritizing input data for testing DNNs in meaningful ways to reduce the cost of labeling can go a long way in increasing testing efficacy. This paper proposes using gauges of the DNN's sentiment derived from the computation performed by the model, as a means to identify inputs that are likely to reveal weaknesses. We empirically assessed the efficacy of three such sentiment measures for prioritization--confidence, uncertainty and surprise--and compare their effectiveness in terms of their fault-revealing capability and retraining effectiveness. The results indicate that sentiment measures can effectively flag inputs that expose unacceptable DNN behavior . For MNIST models, the average percentage of inputs correctly flagged ranged from 88% to 94.8%.
Machine Learning Automation Toolbox (MLaut)
Kazakov, Viktor, Király, Franz J.
MLaut automates large-scale evaluation and benchmarking of machine learning algorithms on a large number of datasets. MLaut provides a high-level workflow interface to machine algorithm algorithms, implements a local back-end to a database of dataset collections, trained algorithms, and experimental results, and provides easy-to-use interfaces to the scikit-learn and keras modelling libraries. Experiments are easy to set up with default settings in a few lines of code, while remaining fully customizable to the level of hyper-parameter tuning, pipeline composition, or deep learning architecture. As a principal test case for MLaut, we conducted a large-scale supervised classification study in order to benchmark the performance of a number of machine learning algorithms - to our knowledge also the first larger-scale study on standard supervised learning data sets to include deep learning algorithms. While corroborating a number of previous findings in literature, we found (within the limitations of our study) that deep neural networks do not perform well on basic supervised learning, i.e., outside the more specialized, image-, audio-, or text-based tasks.
A Bayesian Decision Tree Algorithm
Nuti, Giuseppe, Rugama, Lluís Antoni Jiménez, Cross, Andreea-Ingrid
Noname manuscript No. (will be inserted by the editor) Abstract Bayesian Decision Trees are known for their probabilistic interpretability. However,their construction can sometimes be costly. In this article we present a general Bayesian Decision Tree algorithm applicable to both regression and classification problems. The algorithm does not apply Markov Chain Monte Carlo and does not require a pruning step. While it is possible to construct a weighted probability tree space we find that one particular tree, the greedy-modal tree (GMT), explains most of the information contained in the numerical examples. This approach seems to perform similarly to Random Forests. KeywordsMachine learning · Bayesian statistics · Decision Trees · Random Forests 1 Introduction Decision trees are popular machine learning techniques applied to both classification andregression tasks.
No-regret Bayesian Optimization with Unknown Hyperparameters
Berkenkamp, Felix, Schoellig, Angela P., Krause, Andreas
Bayesian optimization (BO) based on Gaussian process models is a powerful paradigm to optimize black-box functions that are expensive to evaluate. While several BO algorithms provably converge to the global optimum of the unknown function, they assume that the hyperparameters of the kernel are known in advance. This is not the case in practice and misspecification often causes these algorithms to converge to poor local optima. In this paper, we present the first BO algorithm that is provably no-regret and converges to the optimum without knowledge of the hyperparameters. We slowly adapt the hyperparameters of stationary kernels and thereby expand the associated function class over time, so that the BO algorithm considers more complex function candidates. Based on the theoretical insights, we propose several practical algorithms that achieve the empirical data efficiency of BO with online hyperparameter estimation, but retain theoretical convergence guarantees. We evaluate our method on several benchmark problems.