Bayesian Learning
Adversarial Neural Pruning
Madaan, Divyam, Hwang, Sung Ju
It is well known that neural networks are susceptible to adversarial perturbations and are also computationally and memory intensive which makes it difficult to deploy them in real-world applications where security and computation are constrained. In this work, we aim to obtain both robust and sparse networks that are applicable to such scenarios, based on the intuition that latent features have a varying degree of susceptibility to adversarial perturbations. Specifically, we define vulnerability at the latent feature space and then propose a Bayesian framework to prioritize features based on their contribution to both the original and adversarial loss, to prune vulnerable features and preserve the robust ones. Through quantitative evaluation and qualitative analysis of the perturbation to latent features, we show that our sparsification method is a defense mechanism against adversarial attacks and the robustness indeed comes from our model's ability to prune vulnerable latent features that are more susceptible to adversarial perturbations.
Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables
Salehkaleybar, Saber, Ghassami, AmirEmad, Kiyavash, Negar, Zhang, Kun
We consider the problem of learning causal models from observational data generated by linear non-Gaussian acyclic causal models with latent variables. Without considering the effect of latent variables, one usually infers wrong causal relationships among the observed variables. Under faithfulness assumption, we propose a method to check whether there exists a causal path between any two observed variables. From this information, we can obtain the causal order among them. The next question is then whether or not the causal effects can be uniquely identified as well. It can be shown that causal effects among observed variables cannot be identified uniquely even under the assumptions of faithfulness and non-Gaussianity of exogenous noises. However, we will propose an efficient method to identify the set of all possible causal effects that are compatible with the observational data. Furthermore, we present some structural conditions on the causal graph under which we can learn causal effects among observed variables uniquely. We also provide necessary and sufficient graphical conditions for unique identification of the number of variables in the system. Experiments on synthetic data and real-world data show the effectiveness of our proposed algorithm on learning causal models.
Supervised Negative Binomial Classifier for Probabilistic Record Linkage
K, Harish Kashyap, Byadarhaly, Kiran, Shah, Saumya
Motivated by the need of the linking records across various databases, we propose a novel graphical model based classifier that uses a mixture of Poisson distributions with latent variables. The idea is to derive insight into each pair of hypothesis records that match by inferring its underlying latent rate of error using Bayesian Modeling techniques. The novel approach of using gamma priors for learning the latent variables along with supervised labels is unique and allows for active learning. The naive assumption is made deliberately as to the independence of the fields to propose a generalized theory for this class of problems and not to undermine the hierarchical dependencies that could be present in different scenarios. This classifier is able to work with sparse and streaming data. The application to record linkage is able to meet several challenges of sparsity, data streams and varying nature of the data-sets.
Generalization Error Bounds for Deep Variational Inference
Chรฉrief-Abdellatif, Badr-Eddine
Variational inference is becoming more and more popular for approximating intractable posterior distributions in Bayesian statistics and machine learning. Meanwhile, a few recent works have provided theoretical justification and new insights on deep neural networks for estimating smooth functions in usual settings such as nonparametric regression. In this paper, we show that variational inference for sparse deep learning retains the same generalization properties than exact Bayesian inference. In particular, we highlight the connection between estimation and approximation theories via the classical bias-variance trade-off and show that it leads to near-minimax rates of convergence for H\"older smooth functions. Additionally, we show that the model selection framework over the neural network architecture via ELBO maximization does not overfit and adaptively achieves the optimal rate of convergence.
Robust data-driven approach for predicting the configurational energy of high entropy alloys
Zhang, Jiaxin, Liu, Xianglin, Bi, Sirui, Yin, Junqi, Zhang, Guannan, Eisenbach, Markus
High entropy alloys (HEAs) have been increasingly attractive as promising next-generation materials due to their various excellent properties. It's necessary to essentially characterize the degree of chemical ordering and identify order-disorder transitions through efficient simulation and modeling of thermodynamics. In this study, a robust data-driven framework based on Bayesian approaches is proposed and demonstrated on the accurate and efficient prediction of configurational energy of high entropy alloys. The proposed effective pair interaction (EPI) model with ensemble sampling is used to map the configuration and its corresponding energy. The US Government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. Compared with the arbitrary determination of model complexity, we further conduct a physical feature selection to identify the truncation of coordination shells in EPI model using Bayesian information criterion. The results achieve efficient and robust performance in predicting the configurational energy, particularly given small data. The developed methodology is applied to study a series of refractory HEAs, i.e. NbMoTaW, NbMoTaWV and NbMoTaWTi where it is demonstrated how dataset size affects the confidence we can place in statistical estimates of configurational energy when data are sparse. Introduction As one of the typical multicomponent alloys, high entropy alloys (HEAs) consisting of four or more principal elements have been widely studied due to their exceptional mechanical properties [1, 2, 3, 4].
Probabilistic Models with Deep Neural Networks
Masegosa, Andrรฉs R., Cabaรฑas, Rafael, Langseth, Helge, Nielsen, Thomas D., Salmerรณn, Antonio
Recent advances in statistical inference have significantly expanded the toolbox of probabilistic modeling. Historically, probabilistic modeling has been constrained to (i) very restricted model classes where exact or approximate probabilistic inference were feasible, and (ii) small or medium-sized data sets which fit within the main memory of the computer. However, developments in variational inference, a general form of approximate probabilistic inference originated in statistical physics, are allowing probabilistic modeling to overcome these restrictions: (i) Approximate probabilistic inference is now possible over a broad class of probabilistic models containing a large number of parameters, and (ii) scalable inference methods based on stochastic gradient descent and distributed computation engines allow to apply probabilistic modeling over massive data sets. One important practical consequence of these advances is the possibility to include deep neural networks within a probabilistic model to capture complex non-linear stochastic relationships between random variables. These advances in conjunction with the release of novel probabilistic modeling toolboxes have greatly expanded the scope of application of probabilistic models, and allow these models to take advantage of the recent strides made by the deep learning community. In this paper we review the main concepts, methods and tools needed to use deep neural networks within a probabilistic modeling framework.
Variational Bayes on Manifolds
Tran, Minh-Ngoc, Nguyen, Dang H., Nguyen, Duy
Variational Bayes (VB) has become a versatile tool for Bayesian inference in statistics. Nonetheless, the development of the existing VB algorithms is so far generally restricted to the case where the variational parameter space is Euclidean, which hinders the potential broad application of VB methods. This paper extends the scope of VB to the case where the variational parameter space is a Riemannian manifold. We develop, for the first time in the literature, an efficient manifold-based VB algorithm that exploits both the geometric structure of the constraint parameter space and the information geometry of the manifold of VB approximating probability distributions. Our algorithm is provably convergent and achieves a convergence rate of order $\mathcal O(1/\sqrt{T})$ and $\mathcal O(1/T^{2-2\epsilon})$ for a non-convex evidence lower bound function and a strongly retraction-convex evidence lower bound function, respectively. We develop in particular two manifold VB algorithms, Manifold Gaussian VB and Manifold Neural Net VB, and demonstrate through numerical experiments that the proposed algorithms are stable, less sensitive to initialization and compares favourably to existing VB methods.
Agglomerative Fast Super-Paramagnetic Clustering
Concretely, that the proposed algorithm does in fact recover the correct super-paramagnetic cluster configurations that are near the entropy maxima. Previous cases studies include data clustering of stocks [15] and gene data in [4], temporal states of financial markets [8], and state-detection for adaptive machine learning in trading [5]. There is an endless variety of potential use-cases for this type of fast big-data clustering technology. Building on prior work we propose and demonstrate an alternative to fast Super-Paramagnetic Clustering (f-SPC) [15] using a modern and streamlined implementation of the "Merging Algorithm" first suggested by Gi-ada [4], one that can recover the same cluster configurations for a variety of test-cases, but with significantly reduced compute times. We again use the Noh Ansatz [11] and the Maximum Likelihood Estimation approach introduced by Giada and Marsili [4]. We call the new algorithm Agglomerative Super-Paramagnetic Clustering (ASPC) and it has the benefit of being less computationally expensive than the PGAs implemented in [5, 6, 15].
Paired-Consistency: An Example-Based Model-Agnostic Approach to Fairness Regularization in Machine Learning
Horesh, Yair, Haas, Noa, Mishraky, Elhanan, Resheff, Yehezkel S., Lador, Shir Meir
As AI systems develop in complexity it is becoming increasingly hard to ensure non-discrimination on the basis of protected attributes such as gender, age, and race. Many recent methods have been developed for dealing with this issue as long as the protected attribute is explicitly available for the algorithm. We address the setting where this is not the case (with either no explicit protected attribute, or a large set of them). Instead, we assume the existence of a fair domain expert capable of generating an extension to the labeled dataset - a small set of example pairs, each having a different value on a subset of protected variables, but judged to warrant a similar model response. We define a performance metric - paired consistency. Paired consistency measures how close the output (assigned by a classifier or a regressor) is on these carefully selected pairs of examples for which fairness dictates identical decisions. In some cases consistency can be embedded within the loss function during optimization and serve as a fairness regularizer, and in others it is a tool for fair model selection. We demonstrate our method using the well studied Income Census dataset.
Strengthening the Case for a Bayesian Approach to Car-following Model Calibration and Validation using Probabilistic Programming
Abodo, Franklin, Berthaume, Andrew, Zitzow-Childs, Stephen, Bobadilla, Leonardo
-- Compute and memory constraints have historically prevented traffic simulation software users from fully utilizing the predictive models underlying them. When calibrating car-following models, particularly, accommodations have included 1) using sensitivity analysis to limit the number of parameters to be calibrated, and 2) identifying only one set of parameter values using data collected from multiple car-following instances across multiple drivers. Shortcuts are further motivated by insufficient data set sizes, for which a driver may have too few instances to fully account for the variation in their driving behavior . In this paper, we demonstrate that recent technological advances can enable transportation researchers and engineers to overcome these constraints and produce calibration results that 1) outperform industry standard approaches, and 2) allow for a unique set of parameters to be estimated for each driver in a data set, even given a small amount of data. We propose a novel calibration procedure for car-following models based on Bayesian machine learning and probabilistic programming, and apply it to real-world data from a naturalistic driving study. We also discuss how this combination of mathematical and software tools can offer additional benefits such as more informative model validation and the incorporation of true-to-data uncertainty into simulation traces. Traffic simulation software packages are widely used in transportation engineering to estimate the impacts of potential changes to a roadway network and forecast system performance under future scenarios. These packages are underpinned by math-and physics-based models, which are designed to describe behavior at an aggregate (macroscopic) level or at the level of individual drivers (microscopic).