Uncertainty
How Bayesian Networks Are Superior in Understanding Effects of Variables
Bayes Nets (or Bayesian Networks) give remarkable results in determining the effects of many variables on an outcome. They typically perform strongly even in cases when other methods falter or fail. These networks have had relatively little use with business-related problems, although they have worked successfully for years in fields such as scientific research, public safety, aircraft guidance systems and national defense. Importantly, they often outperform regression, particularly in determining variables' effects. Regression is one of the most august multivariate methods, and among the most studied and applied.
Gaussian Process Latent Variable Alignment Learning
Kazlauskaite, Ieva, Ek, Carl Henrik, Campbell, Neill D. F.
We present a model that can automatically learn alignments between high-dimensional data in an unsupervised manner. Learning alignments is an ill-constrained problem as there are many different ways of defining a good alignment. Our proposed method casts alignment learning in a framework where both alignment and data are modelled simultaneously. We derive a probabilistic model built on non-parametric priors that allows for flexible warps while at the same time providing means to specify interpretable constraints. We show results on several datasets, including different motion capture sequences and show that the suggested model outperform the classical algorithmic approaches to the alignment task.
Bayesian Quadrature for Multiple Related Integrals
Xi, Xiaoyue, Briol, Franรงois-Xavier, Girolami, Mark
Bayesian probabilistic numerical methods are a set of tools providing posterior distributions on the output of numerical methods. The use of these methods is usually motivated by the fact that they can represent our uncertainty due to incomplete/finite information about the continuous mathematical problem being approximated. In this paper, we demonstrate that this paradigm can provide additional advantages, such as the possibility of transferring information between several numerical methods. This allows users to represent uncertainty in a more faithful manner and, as a by-product, provide increased numerical efficiency. We propose the first such numerical method by extending the well-known Bayesian quadrature algorithm to the case where we are interested in computing the integral of several related functions. We then prove convergence rates for the method in the well-specified and misspecified cases, and demonstrate its efficiency in the context of multi-fidelity models for complex engineering systems and a problem of global illumination in computer graphics.
Boosting Variational Inference: an Optimization Perspective
Locatello, Francesco, Khanna, Rajiv, Ghosh, Joydeep, Rรคtsch, Gunnar
Variational inference is a popular technique to approximate a possibly intractable Bayesian posterior with a more tractable one. Recently, boosting variational inference has been proposed as a new paradigm to approximate the posterior by a mixture of densities by greedily adding components to the mixture. However, as is the case with many other variational inference algorithms, its theoretical properties have not been studied. In the present work, we study the convergence properties of this approach from a modern optimization viewpoint by establishing connections to the classic Frank-Wolfe algorithm. Our analyses yields novel theoretical insights regarding the sufficient conditions for convergence, explicit rates, and algorithmic simplifications. Since a lot of focus in previous works for variational inference has been on tractability, our work is especially important as a much needed attempt to bridge the gap between probabilistic models and their corresponding theoretical properties.
Variational Bayes In Private Settings (VIPS)
Park, Mijung, Foulds, James, Chaudhuri, Kamalika, Welling, Max
Many applications of Bayesian data analysis involve sensitive information, motivating methods which ensure that privacy is protected. We introduce a general privacy-preserving framework for Variational Bayes (VB), a widely used optimization-based Bayesian inference method. Our framework respects differential privacy, the gold-standard privacy criterion, and encompasses a large class of probabilistic models, called the Conjugate Exponential (CE) family. We observe that we can straightforwardly privatise VB's approximate posterior distributions for models in the CE family, by perturbing the expected sufficient statistics of the complete-data likelihood. For a broadly-used class of non-CE models, those with binomial likelihoods, we show how to bring such models into the CE family, such that inferences in the modified model resemble the private variational Bayes algorithm as closely as possible, using the Polya-Gamma data augmentation scheme. The iterative nature of variational Bayes presents a further challenge since iterations increase the amount of noise needed. We overcome this by combining: (1) an improved composition method for differential privacy, called the moments accountant, which provides a tight bound on the privacy cost of multiple VB iterations and thus significantly decreases the amount of additive noise; and (2) the privacy amplification effect of subsampling mini-batches from large-scale data in stochastic learning. We empirically demonstrate the effectiveness of our method in CE and non-CE models including latent Dirichlet allocation, Bayesian logistic regression, and sigmoid belief networks, evaluated on real-world datasets.
Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle
A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet equivalent uniform (BDeu) score from Heckerman et al (1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work (Scutari, 2016) that the Bayesian Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior.
On Discrimination Discovery and Removal in Ranked Data using Causal Graph
Wu, Yongkai, Zhang, Lu, Wu, Xintao
Predictive models learned from historical data are widely used to help companies and organizations make decisions. However, they may digitally unfairly treat unwanted groups, raising concerns about fairness and discrimination. In this paper, we study the fairness-aware ranking problem which aims to discover discrimination in ranked datasets and reconstruct the fair ranking. Existing methods in fairness-aware ranking are mainly based on statistical parity that cannot measure the true discriminatory effect since discrimination is causal. On the other hand, existing methods in causal-based anti-discrimination learning focus on classification problems and cannot be directly applied to handle the ranked data. To address these limitations, we propose to map the rank position to a continuous score variable that represents the qualification of the candidates. Then, we build a causal graph that consists of both the discrete profile attributes and the continuous score. The path-specific effect technique is extended to the mixed-variable causal graph to identify both direct and indirect discrimination. The relationship between the path-specific effects for the ranked data and those for the binary decision is theoretically analyzed. Finally, algorithms for discovering and removing discrimination from a ranked dataset are developed. Experiments using the real dataset show the effectiveness of our approaches.
Hierarchical Modeling and Shrinkage for User Session Length Prediction in Media Streaming
Dedieu, Antoine, Mazumder, Rahul, Zhu, Zhen, Vahabi, Hossein
An important metric of users' satisfaction and engagement within on-line streaming services is the user session length, i.e. the amount of time they spend on a service continuously without interruption. Being able to predict this value directly benefits the recommendation and ad pacing contexts in music and video streaming services. Recent research has shown that predicting the exact amount of time spent is highly nontrivial due to many external factors for which a user can end a session, and the lack of predictive covariates. Most of the other related literature on duration based user engagement has focused on dwell time for websites, for search and display ads, mainly for post-click satisfaction prediction or ad ranking. In this work we present a novel framework inspired by hierarchical Bayesian modeling to predict, at the moment of login, the amount of time a user will spend in the streaming service. The time spent by a user on a platform depends upon user-specific latent variables which are learned via hierarchical shrinkage. Our framework enjoys theoretical guarantees, naturally incorporates flexible parametric/nonparametric models on the covariates and is found to outperform state-of- the-art estimators in terms of efficiency and predictive performance on real world datasets.
DAGs with NO TEARS: Smooth Optimization for Structure Learning
Zheng, Xun, Aragam, Bryon, Ravikumar, Pradeep, Xing, Eric P.
Estimating the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing approaches rely on various local heuristics for enforcing the acyclicity constraint and are not well-suited to general purpose optimization packages for their solution. In this paper, we introduce a fundamentally different strategy: We formulate the structure learning problem as a smooth, constrained optimization problem over real matrices that avoids this combinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting nonconvex, constrained program involves smooth functions whose gradients are easy to compute and only involve elementary matrix operations. By using existing black-box optimization routines, our method uses global search to find an optimal DAG and can be implemented in about 50 lines of Python and outperforms existing methods without imposing any structural constraints.
Deep Network Regularization via Bayesian Inference of Synaptic Connectivity
Partaourides, Harris, Chatzis, Sotirios P.
Deep neural networks (DNNs) often require good regularizers to generalize well. Currently, state-of-the-art DNN regularization techniques consist in randomly dropping units and/or connections on each iteration of the training algorithm. Dropout and DropConnect are characteristic examples of such regularizers, that are widely popular among practitioners. However, a drawback of such approaches consists in the fact that their postulated probability of random unit/connection omission is a constant that must be heuristically selected based on the obtained performance in some validation set. To alleviate this burden, in this paper we regard the DNN regularization problem from a Bayesian inference perspective: We impose a sparsity-inducing prior over the network synaptic weights, where the sparsity is induced by a set of Bernoulli-distributed binary variables with Beta (hyper-)priors over their prior parameters. This way, we eventually allow for marginalizing over the DNN synaptic connectivity for output generation, thus giving rise to an effective, heuristics-free, network regularization scheme. We perform Bayesian inference for the resulting hierarchical model by means of an efficient Black-Box Variational inference scheme. We exhibit the advantages of our method over existing approaches by conducting an extensive experimental evaluation using benchmark datasets.