Bayesian Inference
Locally Private Bayesian Inference for Count Models
Schein, Aaron, Wu, Zhiwei Steven, Zhou, Mingyuan, Wallach, Hanna
As more aspects of social interaction are digitally recorded, there is a growing need to develop privacy-preserving data analysis methods. Social scientists will be more likely to adopt these methods if doing so entails minimal change to their current methodology. Toward that end, we present a general and modular method for privatizing Bayesian inference for Poisson factorization, a broad class of models that contains some of the most widely used models in the social sciences. Our method satisfies local differential privacy, which ensures that no single centralized server need ever store the non-privatized data. To formulate our local-privacy guarantees, we introduce and focus on limited-precision local privacy---the local privacy analog of limited-precision differential privacy (Flood et al., 2013). We present two case studies, one involving social networks and one involving text corpora, that test our method's ability to form the posterior distribution over latent variables under different levels of noise, and demonstrate our method's utility over a na\"{i}ve approach, wherein inference proceeds as usual, treating the privatized data as if it were not privatized.
Robust and Parallel Bayesian Model Selection
Zhang, Michael Minyi, Lam, Henry, Lin, Lizhen
Being able to select the right model for inference is a crucial task. As our main example, we consider model selection for a normal linear model: Y Xβ, N (0,σ 2 I), (1) where Y is anN dimensional response vector,X is anN D dimensional design matrix and β is a D dimensional vector of regression parameters. Here the candidate models to be selected could refer to the sets of significant variables. In a Bayesian setting, we have a natural probabilistic evaluation of models 5 through posterior model probabilities. Depending on the objectives of the data analysis, we may be interested in assessing the belief on which is the "best" model or obtaining predictions with minimum error. Existing procedures to accomplish the aforementioned goals, however, will perform poorly under the presence of outliers and contaminations. In addition, 10 Markov chain Monte Carlo (MCMC) algorithms for these methods do not scale to big data situations. The goal of this paper is to investigate a "divide-and- conquer" method that integrates with existing Bayesian model selection techniques, in a way that is robust to outliers and, moreover, allows us to perform Bayesian model selection in parallel.
Bayesian Q-learning with Assumed Density Filtering
Jeong, Heejin (University of Pennsylvania) | Lee, Daniel D. (University of Pennsylvania)
While off-policy temporal difference methods have been broadly used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have been relatively understudied. This is mainly because the max operator in the Bellman optimality equation brings non-linearity and inconsistent distributions over value function. In this paper, we introduce a new Bayesian approach to off-policy TD methods using Assumed Density Filtering, called ADFQ, which updates beliefs on action-values (Q) through an online Bayesian inference method. Uncertainty measures in the beliefs not only are used in exploration but they provide a natural regularization in the belief updates. We also present a connection between ADFQ and Q-learning. Our empirical results show the proposed ADFQ algorithms outperform comparing algorithms in several task domains. Moreover, our algorithms improve general drawbacks in BRL such as efficiency, usage of uncertainty, and nonlinearity.
A Survey on Application of Machine Learning Techniques in Optical Networks
Musumeci, Francesco, Rottondi, Cristina, Nag, Avishek, Macaluso, Irene, Zibar, Darko, Ruffini, Marco, Tornatore, Massimo
Today, the amount of data that can be retrieved from communications networks is extremely high and diverse (e.g., data regarding users behavior, traffic traces, network alarms, signal quality indicators, etc.). Advanced mathematical tools are required to extract useful information from this large set of network data. In particular, Machine Learning (ML) is regarded as a promising methodological area to perform network-data analysis and enable, e.g., automatized network self-configuration and fault management. In this survey we classify and describe relevant studies dealing with the applications of ML to optical communications and networking. Optical networks and system are facing an unprecedented growth in terms of complexity due to the introduction of a huge number of adjustable parameters (such as routing configurations, modulation format, symbol rate, coding schemes, etc.), mainly due to the adoption of, among the others, coherent transmission/reception technology, advanced digital signal processing and to the presence of nonlinear effects in optical fiber systems. Although a good number of research papers have appeared in the last years, the application of ML to optical networks is still in its early stage. In this survey we provide an introductory reference for researchers and practitioners interested in this field. To stimulate further work in this area, we conclude the paper proposing new possible research directions.
Efficient Structure Learning and Sampling of Bayesian Networks
Kuipers, Jack, Suter, Polina, Moffa, Giusi
Editor: Bayesian networks are probabilistic graphical models widely employed to understand dependencies in high dimensional data, and even to facilitate causal discovery. Learning the underlying network structure, which is encoded as a directed acyclic graph (DAG) is highly challenging mainly due to the vast number of possible networks. Efforts have focussed on two fronts: constraint based methods that perform conditional independence tests to exclude edges and score and search approaches which explore the DAG space with greedy or MCMC schemes. Here we synthesise these two fields in a novel hybrid method which reduces the complexity of MCMC approaches to that of a constraint based method. Individual steps in the MCMC scheme only require simple table lookups so that very long chains can be efficiently obtained. Furthermore, the scheme includes an iterative procedure to correct for errors from the conditional independence tests. The algorithm not only offers markedly superior performance to alternatives, but DAGs can also be sampled from the posterior distribution enabling full Bayesian modelling averaging for much larger Bayesian networks.
Madrid Advanced Statistics and Data Mining Summer School
The Madrid ASDM summer school is in its thirteenth edition this year, with hundreds of students from all over the world having attended so far. It comprises 12 intensive (15 lecture hours) week-long courses, and a student may attend from one up to six courses. The courses cover topics such as Neural Networks and Deep Learning, Bayesian Networks, Big Data with Apache Spark, Bayesian Inference, Text Mining and Time Series. Each course has theoretical and practical classes, the latter done with R or python. While the summer school is mainly attended by people from academia - PhD students and researchers-, people from the industry also assist.
Uncertainty Estimation via Stochastic Batch Normalization
Atanov, Andrei, Ashukha, Arsenii, Molchanov, Dmitry, Neklyudov, Kirill, Vetrov, Dmitry
In this work, we investigate Batch Normalization technique and propose its probabilistic interpretation. We propose a probabilistic model and show that Batch Normalization maximazes the lower bound of its marginalized log-likelihood. Then, according to the new probabilistic model, we design an algorithm which acts consistently during train and test. However, inference becomes computationally inefficient. To reduce memory and computational cost, we propose Stochastic Batch Normalization -- an efficient approximation of proper inference procedure. This method provides us with a scalable uncertainty estimation technique. We demonstrate the performance of Stochastic Batch Normalization on popular architectures (including deep convolutional architectures: VGG-like and ResNets) for MNIST and CIFAR-10 datasets.
Predictor Variable Prioritization in Nonlinear Models: A Genetic Association Case Study
Crawford, Lorin, Flaxman, Seth R., Runcie, Daniel E., West, Mike
The central aim in this paper is to address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel, interpretable, and computationally efficient way to summarize the relative importance of predictor variables. Methodologically, we develop the "RelATive cEntrality" (RATE) measure to prioritize candidate genetic variants that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data. We illustrate RATE through Bayesian Gaussian process regression, but the methodological innovations apply to other nonlinear methods. It is known that nonlinear models often exhibit greater predictive accuracy than linear models, particularly for phenotypes generated by complex genetic architectures. With detailed simulations and an Arabidopsis thaliana QTL mapping study, we show that applying RATE enables an explanation for this improved performance.
Momentum-Space Renormalization Group Transformation in Bayesian Image Modeling by Gaussian Graphical Model
Tanaka, Kazuyuki, Nakamura, Masamichi, Kataoka, Shun, Ohzeki, Masayuki, Yasuda, Muneki
A new Bayesian modeling method is proposed by combining the maximization of the marginal likelihood with a momentum-space renormalization group transformation for Gaussian graphical models. Moreover, we present a scheme for computint the statistical averages of hyperparameters and mean square errors in our proposed method based on a momentumspace renormalization transformation.
Basics of Bayesian Decision Theory
The use of formal statistical methods to analyse quantitative data in data science has increased considerably over the last few years. One such approach, Bayesian Decision Theory (BDT), also known as Bayesian Hypothesis Testing and Bayesian inference, is a fundamental statistical approach that quantifies the tradeoffs between various decisions using distributions and costs that accompany such decisions. In pattern recognition it is used for designing classifiers making the assumption that the problem is posed in probabilistic terms, and that all of the relevant probability values are known. Generally, we don't have such perfect information but it is a good place to start when studying machine learning, statistical inference, and detection theory in signal processing. BDT also has many applications in science, engineering, and medicine.