Uncertainty
Distributed Learning for Cooperative Inference
Nediฤ, Angelia, Olshevsky, Alex, Uribe, Cรฉsar A.
In a distributed system, the interactions between agents are usually restricted to follow certain constraints on the flow of information imposed by the network structure. Such information constraints cause the agents to only be able to use locally available information. This contrasts with centralized approaches where all information and computation resources are available at a single location [24, 68, 64, 62]. One traditional problem in decision-making is that of parameter estimation or statistical learning. Given a set of noisy observations coming from a joint distribution one would like to estimate a parameter or distribution that minimizes a certain loss function. For example, Maximum a Posteriori (MAP) or Minimum Least Squared Error (MLSE) estimators fit a parameter to some model of the observations. Both, MAP and MLSE estimators require some form of Bayesian posterior computation based on models that explain the observations for a given parameter. Computation of such a posteriori distributions depends on having exact models about the likelihood of the corresponding observations.
Spatio-Temporal Modeling of Users' Check-ins in Location-Based Social Networks
Zarezade, Ali, Jafarzadeh, Sina, Rabiee, Hamid R.
People can upload a geotagged video, photo or text to social networks like Facebook and Twitter, share their present location on Foursquare or share their travel route using GPS trajectories to GeoLife [49]. A considerable amount of this spatiotemporal data is generated by the activity of users in location-based social networks (LBSN). In a typical LBSN, like Foursquare, users share the time and geolocation of their check-ins, comment about it, or unlock badges by exploring new venues. Many techniques have been proposed for processing, managing, and mining the trajectory data in the past decade [55]. Several other studies try to leverage the spatial data in recommender systems [23]. However, a few works have attempted to model the spatiotemporal behavior of users in LBSNs [5, 6]. Given the history of users' check-ins, the goal is to predict the time and location of This work is supported by ICT Innovation Center, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
Differentially Private Variational Inference for Non-conjugate Models
Jรคlkรถ, Joonas, Dikmen, Onur, Honkela, Antti
Many machine learning applications are based on data collected from people, such as their tastes and behaviour as well as biological traits and genetic data. Regardless of how important the application might be, one has to make sure individuals' identities or the privacy of the data are not compromised in the analysis. Differential privacy constitutes a powerful framework that prevents breaching of data subject privacy from the output of a computation. Differentially private versions of many important Bayesian inference methods have been proposed, but there is a lack of an efficient unified approach applicable to arbitrary models. In this contribution, we propose a differentially private variational inference method with a very wide applicability. It is built on top of doubly stochastic variational inference, a recent advance which provides a variational solution to a large class of models. We add differential privacy into doubly stochastic variational inference by clipping and perturbing the gradients. The algorithm is made more efficient through privacy amplification from subsampling. We demonstrate the method can reach an accuracy close to non-private level under reasonably strong privacy guarantees, clearly improving over previous sampling-based alternatives especially in the strong privacy regime.
Combining independent evidence using a Bayesian approach but without standard Bayesian updating?
I have made some progress with my work on combining independent evidence using a Bayesian approach but eschewing standard Bayesian updating. I found a neat analytical way of doing this, to a very good approximation, in cases where each estimate of a parameter corresponds to the ratio of two variables each determined with normal error, the fractional uncertainty in the numerator and denominator variables differing between the types of evidence. This seems a not uncommon situation in science, and it is a good approximation to that which exists when estimating climate sensitivity. I have had a manuscript in which I develop and test this method accepted by the Journal of Statistical Planning and Inference (for a special issue on Confidence Distributions edited by Tore Schweder and Nils Hjort). Frequentist coverage is almost exact using my analytical solution, based on combining Jeffreys' priors in quadrature, whereas Bayesian updating produces far poorer probability matching.
Mixed Graphical Models for Causal Analysis of Multi-modal Variables
Sedgewick, Andrew J, Ramsey, Joseph D., Spirtes, Peter, Glymour, Clark, Benos, Panayiotis V.
Graphical causal models are an important tool for knowledge discovery because they can represent both the causal relations between variables and the multivariate probability distributions over the data. Once learned, causal graphs can be used for classification, feature selection and hypothesis generation, while revealing the underlying causal network structure and thus allowing for arbitrary likelihood queries over the data. However, current algorithms for learning sparse directed graphs are generally designed to handle only one type of data (continuous-only or discrete-only), which limits their applicability to a large class of multi-modal biological datasets that include mixed type variables. To address this issue, we developed new methods that modify and combine existing methods for finding undirected graphs with methods for finding directed graphs. These hybrid methods are not only faster, but also perform better than the directed graph estimation methods alone for a variety of parameter settings and data set sizes. Here, we describe a new conditional independence test for learning directed graphs over mixed data types and we compare performances of different graph learning strategies on synthetic data.
Parameter Inference -- Maximum Aposteriori โ Towards Data Science โ Medium
In the previous post, we discussed the motivation behind Maximum Likelihood Estimate and how to calculate it. We also learned a few tricks about calculating the log likelihood of a function by citing the application of monotonic functions, and how they make the entire process of estimating the critical points of a function much easier as they preserve those critical points. Put the #tails (0) and #heads (2) in the equation of theta_MLE, This result tells us that the probability of next flip being Tails is 0 (i.e., it predicts that no flip is ever gonna turn up Tails the coin is always going to show Heads), and it is glaringly obvious that this is not the case (barring the extreme case where the coin is heavily loaded). Now, this poses a big problem in the Parameter Estimation process because it does not give us the accurate probability of the next flip. We know that even a fair coin has a 25% chance of showing two Heads in a row (0.5 x 0.5 0.25).
A Quasi-Bayesian Perspective to Online Clustering
Li, Le, Guedj, Benjamin, Loustau, Sรฉbastien
When faced with high frequency streams of data, clustering raises theoretical and algorithmic pitfalls. We introduce a new and adaptive online clustering algorithm relying on a quasi-Bayesian approach, with a dynamic (\emph{i.e.}, time-dependent) estimation of the (unknown and changing) number of clusters. We prove that our approach is supported by minimax regret bounds. We also provide an RJMCMC-flavored implementation (called PACBO) for which we give a convergence guarantee. Finally, numerical experiments illustrate the potential of our procedure.
Some Properties of Batch Value of Information in the Selection Problem
Shperberg, Shahaf S., Shimony, Solomon Eyal
Given a set of items of unknown utility, we need to select one with a utility as high as possible ("the selection problem"). Measurements (possibly noisy) of item values prior to selection are allowed, at a known cost. The goal is to optimize the overall sequential decision process of measurements and selection. Value of information (VOI) is a well-known scheme for selecting measurements, but the intractability of the problem typically leads to using myopic VOI estimates. Other schemes have also been proposed, some with approximation guarantees, based on submodularity criteria. However, it was observed that the VOI is not submodular in general. In this paper we examine theoretical properties of VOI for the selection problem, and identify cases of submodularity and supermodularity. We suggest how to use these properties to compute approximately optimal measurement batch policies, with an example based on a "wine selection problem".
Rapid Mixing Swendsen-Wang Sampler for Stochastic Partitioned Attractive Models
Park, Sejun, Jang, Yunhun, Galanis, Andreas, Shin, Jinwoo, Stefankovic, Daniel, Vigoda, Eric
The Gibbs sampler is a particularly popular Markov chain used for learning and inference problems in Graphical Models (GMs). These tasks are computationally intractable in general, and the Gibbs sampler often suffers from slow mixing. In this paper, we study the Swendsen-Wang dynamics which is a more sophisticated Markov chain designed to overcome bottlenecks that impede the Gibbs sampler. We prove O(\log n) mixing time for attractive binary pairwise GMs (i.e., ferromagnetic Ising models) on stochastic partitioned graphs having n vertices, under some mild conditions, including low temperature regions where the Gibbs sampler provably mixes exponentially slow. Our experiments also confirm that the Swendsen-Wang sampler significantly outperforms the Gibbs sampler when they are used for learning parameters of attractive GMs.
Robust Causal Estimation in the Large-Sample Limit without Strict Faithfulness
Bucur, Ioan Gabriel, Claassen, Tom, Heskes, Tom
Causal effect estimation from observational data is an important and much studied research topic. The instrumental variable (IV) and local causal discovery (LCD) patterns are canonical examples of settings where a closed-form expression exists for the causal effect of one variable on another, given the presence of a third variable. Both rely on faithfulness to infer that the latter only influences the target effect via the cause variable. In reality, it is likely that this assumption only holds approximately and that there will be at least some form of weak interaction. This brings about the paradoxical situation that, in the large-sample limit, no predictions are made, as detecting the weak edge invalidates the setting. We introduce an alternative approach by replacing strict faithfulness with a prior that reflects the existence of many 'weak' (irrelevant) and 'strong' interactions. We obtain a posterior distribution over the target causal effect estimator which shows that, in many cases, we can still make good estimates. We demonstrate the approach in an application on a simple linear-Gaussian setting, using the MultiNest sampling algorithm, and compare it with established techniques to show our method is robust even when strict faithfulness is violated.