Bayesian Inference
Ensemble Neural Networks (ENN): A gradient-free stochastic method
Chena, Yuntian, Changa, Haibin, Jina, Meng, Zhanga, Dongxiao
Abstract: In this study, an efficient stochastic gradient - free method, the ensemble neural networks (ENN), is developed. In the ENN, the optimization process relies on covariance matrices rather than derivatives. The covariance matrices are calculated by the ensemb le randomized maximum likelihood algorithm (EnRML), which is an inverse modeling method. The ENN is able to simultaneously provide estimations and perform uncertainty quantification since it is built under the Bayesian framework. The ENN is also robust to small training data size because the ensemble of stochastic realizations essentially enlarges the training dataset. This constitutes a desirable characteristic, especially for real - world engineering applications. In addition, the ENN does not require the c alculation of gradients, which enables the use of complicated neuron models and loss functions in neural networks. We experimentally demonstrate benefits of the proposed model, in particular showing that the ENN performs much better than the traditional Ba yesian neural networks (BNN). The EnRML in ENN is a substitution of gradient - based optimization algorithms, which means that it can be directly combined with the feed - forward process in other existing (deep) neural networks, such as convolutional neural ne tworks (CNN) and recurrent neural networks (RNN), broadening future applications of the ENN. Keywords: Inverse modeling, Gradient - free, Uncertainty quantification, Robust to small d ata size, Stochastic method 1. Introduction Artificial neural networks (ANN) are computing systems inspired by biological neural networks that constitute animal brains. ANN is capable of approximating nonlinear functional relationships between input and output variables (Kim et al., 2018). From a ma thematical perspective, a neural network can model any function up to any given precision with a sufficiently large number of basis functions (Cybenko, 1989; Hornik, 1991). In addition, we can even use much smaller models by constructing hierarchy neural n etworks (Delalleau & Bengio, 2011; Gal, 2016). The basic processing elements of neural networks are neurons. A collection of neurons is referred to as a layer, and the collection of interconnected layers forms the neural networks (Kim et al., 2018). A four - layer neural network is illustrated in Figure 1 as an example. In a neuron, the output is calculated by a nonlinear function of the sum of its inputs. The connections between different neurons from adjacent layers are represented by the weights in a model. The weights adjust as learning proceeds, and they represent the strength of the signal at a connection. The nonlinear function is also called the activation function, and the most popular choices are sigmoid, tansig, and ReLU (Li et al., 2015). 2 ANN has bee n widely applied to solving real - world engineering problems, and the following three topics are significant for effective applications .
A Hierarchical Bayesian Model for Size Recommendation in Fashion
Guigourès, Romain, Ho, Yuen King, Koriagin, Evgenii, Sheikh, Abdul-Saboor, Bergmann, Urs, Shirvany, Reza
We introduce a hierarchical Bayesian approach to tackle the challenging problem of size recommendation in e-commerce fashion. Our approach jointly models a size purchased by a customer, and its possible return event: 1. no return, 2. returned too small 3. returned too big. Those events are drawn following a multinomial distribution parameterized on the joint probability of each event, built following a hierarchy combining priors. Such a model allows us to incorporate extended domain expertise and article characteristics as prior knowledge, which in turn makes it possible for the underlying parameters to emerge thanks to sufficient data. Experiments are presented on real (anonymized) data from millions of customers along with a detailed discussion on the efficiency of such an approach within a large scale production system.
Probabilistic Residual Learning for Aleatoric Uncertainty in Image Restoration
Aleatoric uncertainty is an intrinsic property of ill-posed inverse and imaging problems. Its quantification is vital for assessing the reliability of relevant point estimates. In this paper, we propose an efficient framework for quantifying aleatoric uncertainty for deep residual learning and showcase its significant potential on image restoration. In the framework, we divide the conditional probability modeling for the residual variable into a deterministic homo-dimensional level, a stochastic low-dimensional level and a merging level. The low-dimensionality is especially suitable for sparse correlation between image pixels, enables efficient sampling for high dimensional problems and acts as a regularizer for the distribution. Preliminary numerical experiments show that the proposed method can give not only state-of-the-art point estimates of image restoration but also useful associated uncertainty information.
Scalable Bayesian Non-linear Matrix Completion
Qin, Xiangju, Blomstedt, Paul, Kaski, Samuel
Matrix completion aims to predict missing elements in a partially observed data matrix which in typical applications, such as collaborative filtering, is large and extremely sparsely observed. A standard solution is matrix factorization, which predicts unobserved entries as linear combinations of latent variables. We generalize to nonlinear combinations in massive-scale matrices. Bayesian approaches have been proven beneficial in linear matrix completion, but not applied in the more general nonlinear case, due to limited scalability. We introduce a Bayesian nonlinear matrix completion algorithm, which is based on a recent Bayesian formulation of Gaussian process latent variable models. To solve the challenges regarding scalability and computation, we propose a data-parallel distributed computational approach with a restricted communication scheme. We evaluate our method on challenging out-of-matrix prediction tasks using both simulated and real-world data. 1 Introduction In matrix completion--one of the most widely used approaches for collaborative filtering--the objective is to predict missing elements of a partially observed data matrix.
Neural Network based Explicit Mixture Models and Expectation-maximization based Learning
Liu, Dong, Vu, Minh Thành, Chatterjee, Saikat, Rasmussen, Lars K.
We propose two neural network based mixture models in this article. The proposed mixture models are explicit in nature. The explicit models have analytical forms with the advantages of computing likelihood and efficiency of generating samples. Computation of likelihood is an important aspect of our models. Expectation-maximization based algorithms are developed for learning parameters of the proposed models. We provide sufficient conditions to realize the expectation-maximization based learning. The main requirements are invertibility of neural networks that are used as generators and Jacobian computation of functional form of the neural networks. The requirements are practically realized using a flow-based neural network. In our first mixture model, we use multiple flow-based neural networks as generators. Naturally the model is complex. A single latent variable is used as the common input to all the neural networks. The second mixture model uses a single flow-based neural network as a generator to reduce complexity. The single generator has a latent variable input that follows a Gaussian mixture distribution. We demonstrate efficiency of proposed mixture models through extensive experiments for generating samples and maximum likelihood based classification.
Multi-agent Inverse Reinforcement Learning for Two-person Zero-sum Games
Lin, Xiaomin, Beling, Peter A., Cogill, Randy
The focus of this paper is a Bayesian framework for solving a class of problems termed multi-agent inverse reinforcement learning (MIRL). Compared to the well-known inverse reinforcement learning (IRL) problem, MIRL is formalized in the context of stochastic games, which generalize Markov decision processes to game theoretic scenarios. We establish a theoretical foundation for competitive two-agent zero-sum MIRL problems and propose a Bayesian solution approach in which the generative model is based on an assumption that the two agents follow a minimax bi-policy. Numerical results are presented comparing the Bayesian MIRL method with two existing methods in the context of an abstract soccer game. Investigation centers on relationships between the extent of prior information and the quality of learned rewards. Results suggest that covariance structure is more important than mean value in reward priors.
Variational f-divergence Minimization
Zhang, Mingtian, Bird, Thomas, Habib, Raza, Xu, Tianlin, Barber, David
Probabilistic models are often trained by maximum likelihood, which corresponds to minimizing a specific f-divergence between the model and data distribution. In light of recent successes in training Generative Adversarial Networks, alternative non-likelihood training criteria have been proposed. Whilst not necessarily statistically efficient, these alternatives may better match user requirements such as sharp image generation. A general variational method for training probabilistic latent variable models using maximum likelihood is well established; however, how to train latent variable models using other f-divergences is comparatively unknown. We discuss a variational approach that, when combined with the recently introduced Spread Divergence, can be applied to train a large class of latent variable models using any f-divergence.
Adaptively stacking ensembles for influenza forecasting with incomplete data
McAndrew, Thomas, Reich, Nicholas G.
Seasonal influenza infects between 10 and 50 million people in the United States every year, overburdening hospitals during weeks of peak incidence. Named by the CDC as an important tool to fight the damaging effects of these epidemics, accurate forecasts of influenza and influenza-like illness (ILI) forewarn public health officials about when, and where, seasonal influenza outbreaks will hit hardest. Multi-model ensemble forecasts---weighted combinations of component models---have shown positive results in forecasting. Ensemble forecasts of influenza outbreaks have been static, training on all past ILI data at the beginning of a season, generating a set of optimal weights for each model in the ensemble, and keeping the weights constant. We propose an adaptive ensemble forecast that (i) changes model weights week-by-week throughout the influenza season, (ii) only needs the current influenza season's data to make predictions, and (iii) by introducing a prior distribution, shrinks weights toward the reference equal weighting approach and adjusts for observed ILI percentages that are subject to future revisions. We investigate the prior's ability to impact adaptive ensemble performance and, after finding an optimal prior via a cross-validation approach, compare our adaptive ensemble's performance to equal-weighted and static ensembles. Applied to forecasts of short-term ILI incidence at the regional and national level in the US, our adaptive model outperforms a naive equal-weighted ensemble, and has similar or better performance to the static ensemble, which requires multiple years of training data. Adaptive ensembles are able to quickly train and forecast during epidemics, and provide a practical tool to public health officials looking for forecasts that can conform to unique features of a specific season.
Bayesian Robustness: A Nonasymptotic Viewpoint
Bhatia, Kush, Ma, Yi-An, Dragan, Anca D., Bartlett, Peter L., Jordan, Michael I.
The goal is to capture the sensitivity of inferential proc edures to the presence of outliers in the data and misspecifications in the modelling a ssumptions, and to mitigate overly large sensitivity. The Bayesian approach has been fo cused on capturing possible anomalies in the observed data via the model and in choosing p riors that have minimal effect on inferences. The frequentist approach, on the other hand, has focused on the development of estimators that identify and guard against o utliers in the data. We refer the reader to [ Hub11, Chap 15] for a comprehensive discussion.
von Neumann-Morgenstern and Savage Theorems for Causal Decision Making
Gonzalez-Soto, Mauricio, Sucar, Luis E., Escalante, Hugo J.
Decision making under uncertain conditions has been well studied when uncertainty can only be considered at the associative level of information. The classical Theorems of von Neumann-Morgenstern and Savage provide a formal criterion for rationally making choices using associative information. We provide here a previous result from Pearl and show that it can be considered as a causal version of the von Neumann-Morgenstern Theorem; furthermore, we consider the case when the true causal mechanism that controls the environment is unknown to the decision maker and propose a causal version of the Savage Theorem. As applications, we argue how previous optimal action learning methods for causal environments fit within the Causal Savage Theorem we present thus showing the utility of our result in the justification and design of learning algorithms; furthermore, we define a Causal Nash Equilibria for a strategic game in a causal environment in terms of the preferences induced by our Causal Decision Making Theorem.