Learning Graphical Models
How to Approach a Data Intensive Problem
"It is a capital mistake to theorise before one has data." Are you stuck with a problem? Previously I have written a general introduction about predictive functions, and where you might use them for providing "killer" features in your applications. I argued that the data analytics will be a part of the modern software engineering. In both of these disciplines problem solving is an essential skill, and harder are the problems you can crack, the more unique applications you will get.
EM Algorithm and Stochastic Control in Economics
Kou, Steven, Peng, Xianhua, Xu, Xingbo
Generalising the idea of the classical EM algorithm that is widely used for computing maximum likelihood estimates, we propose an EM-Control (EM-C) algorithm for solving multi-period finite time horizon stochastic control problems. The new algorithm sequentially updates the control policies in each time period using Monte Carlo simulation in a forward-backward manner; in other words, the algorithm goes forward in simulation and backward in optimization in each iteration. Similar to the EM algorithm, the EM-C algorithm has the monotonicity of performance improvement in each iteration, leading to good convergence properties. We demonstrate the effectiveness of the algorithm by solving stochastic control problems in the monopoly pricing of perishable assets and in the study of real business cycle.
Communication-Efficient Distributed Statistical Inference
Jordan, Michael I., Lee, Jason D., Yang, Yun
We present a Communication-efficient Surrogate Likelihood (CSL) framework for solving distributed statistical inference problems. CSL provides a communication-efficient surrogate to the global likelihood that can be used for low-dimensional estimation, high-dimensional regularized estimation and Bayesian inference. For low-dimensional estimation, CSL provably improves upon naive averaging schemes and facilitates the construction of confidence intervals. For high-dimensional regularized estimation, CSL leads to a minimax-optimal estimator with controlled communication cost. For Bayesian inference, CSL can be used to form a communication-efficient quasi-posterior distribution that converges to the true posterior. This quasi-posterior procedure significantly improves the computational efficiency of MCMC algorithms even in a non-distributed setting. We present both theoretical analysis and experiments to explore the properties of the CSL approximation.
Estimating Causal Direction and Confounding of Two Discrete Variables
Chalupka, Krzysztof, Eberhardt, Frederick, Perona, Pietro
We propose a method to classify the causal relationship between two discrete variables given only the joint distribution of the variables, acknowledging that the method is subject to an inherent baseline error. We assume that the causal system is acyclicity, but we do allow for hidden common causes. Our algorithm presupposes that the probability distributions $P(C)$ of a cause $C$ is independent from the probability distribution $P(E\mid C)$ of the cause-effect mechanism. While our classifier is trained with a Bayesian assumption of flat hyperpriors, we do not make this assumption about our test data. This work connects to recent developments on the identifiability of causal models over continuous variables under the assumption of "independent mechanisms". Carefully-commented Python notebooks that reproduce all our experiments are available online at http://vision.caltech.edu/~kchalupk/code.html.
Learning heat diffusion graphs
Thanou, Dorina, Dong, Xiaowen, Kressner, Daniel, Frossard, Pascal
Effective information analysis generally boils down to properly identifying the structure or geometry of the data, which is often represented by a graph. In some applications, this structure may be partly determined by design constraints or pre-determined sensing arrangements, like in road transportation networks for example. In general though, the data structure is not readily available and becomes pretty difficult to define. In particular, the global smoothness assumptions, that most of the existing works adopt, are often too general and unable to properly capture localized properties of data. In this paper, we go beyond this classical data model and rather propose to represent information as a sparse combination of localized functions that live on a data structure represented by a graph. Based on this model, we focus on the problem of inferring the connectivity that best explains the data samples at different vertices of a graph that is a priori unknown. We concentrate on the case where the observed data is actually the sum of heat diffusion processes, which is a quite common model for data on networks or other irregular structures. We cast a new graph learning problem and solve it with an efficient nonconvex optimization algorithm. Experiments on both synthetic and real world data finally illustrate the benefits of the proposed graph learning framework and confirm that the data structure can be efficiently learned from data observations only. We believe that our algorithm will help solving key questions in diverse application domains such as social and biological network analysis where it is crucial to unveil proper geometry for data understanding and inference.
Exact Inference Techniques for the Analysis of Bayesian Attack Graphs
Muñoz-González, Luis, Sgandurra, Daniele, Barrère, Martín, Lupu, Emil
Attack graphs are a powerful tool for security risk assessment by analysing network vulnerabilities and the paths attackers can use to compromise network resources. The uncertainty about the attacker's behaviour makes Bayesian networks suitable to model attack graphs to perform static and dynamic analysis. Previous approaches have focused on the formalization of attack graphs into a Bayesian model rather than proposing mechanisms for their analysis. In this paper we propose to use efficient algorithms to make exact inference in Bayesian attack graphs, enabling the static and dynamic network risk assessments. To support the validity of our approach we have performed an extensive experimental evaluation on synthetic Bayesian attack graphs with different topologies, showing the computational advantages in terms of time and memory use of the proposed techniques when compared to existing approaches.
How Bayesian Inference Works
Brandon is an author and deep learning developer. He has worked as Principal Data Scientist at Microsoft, as well as for DuPont Pioneer and Sandia National Laboratories. Brandon earned a Ph.D. in Mechanical Engineering from the Massachusetts Institute of Technology. Bayesian inference is a way to get sharper predictions from your data. It's particularly useful when you don't have as much data as you would like and want to juice every last bit of predictive strength from it. Although it is sometimes described with reverence, Bayesian inference isn't magic or mystical. And even though the math under the hood can get dense, the concepts behind it are completely accessible. In brief, Bayesian inference lets you draw stronger conclusions from your data by folding in what you already know about the answer. Bayesian inference is based on the ideas of Thomas Bayes, a nonconformist Presbyterian minister in London about 300 years ago. He wrote two books, one on theology, and one on probability.
Reparameterization trick for discrete variables
Low-variance gradient estimation is crucial for learning directed graphical models parameterized by neural networks, where the reparameterization trick is widely used for those with continuous variables. While this technique gives low-variance gradient estimates, it has not been directly applicable to discrete variables, the sampling of which inherently requires discontinuous operations. We argue that the discontinuity can be bypassed by marginalizing out the variable of interest, which results in a new reparameterization trick for discrete variables. This reparameterization greatly reduces the variance, which is understood by regarding the method as an application of common random numbers to the estimation. The resulting estimator is theoretically guaranteed to have a variance not larger than that of the likelihood-ratio method with the optimal input-dependent baseline. We give empirical results for variational learning of sigmoid belief networks.
Statistical Inverse Formulation of Optical Flow with Uncertainty Quantification
Optical flow refers to the visual motion observed between two consecutive images. Since the degree of freedom is typically much larger than the constraints imposed by the image observations, the straightforward formulation of optical flow inference is an ill-posed problem. By setting some type of additional "regularity" constraints, classical approaches formulate a well-posed optical flow inference problem in the form of a parameterized set of variational equations. In this work we build a mathematical connection, focused on optical flow methods, between classical variational optical flow approaches and Bayesian statistical inversion. A classical optical flow solution is in fact identical to a maximum a posteriori estimator under the assumptions of linear model with additive independent Gaussian noise and a Gaussian prior distribution. Unlike classical approaches, the statistical inversion approach to optical flow estimation not only allows for "point" estimates, but also provides a distribution of solutions which can be used for ensemble estimation and in particular uncertainty quantification.
Analyzing Games with Ambiguous Player Types using the ${\rm MINthenMAX}$ Decision Model
In many common interactive scenarios, participants lack information about other participants, and specifically about the preferences of other participants. In this work, we model an extreme case of incomplete information, which we term games with type ambiguity, where a participant lacks even information enabling him to form a belief on the preferences of others. Under type ambiguity, one cannot analyze the scenario using the commonly used Bayesian framework, and therefore he needs to model the participants using a different decision model. In this work, we present the ${\rm MINthenMAX}$ decision model under ambiguity. This model is a refinement of Wald's MiniMax principle, which we show to be too coarse for games with type ambiguity. We characterize ${\rm MINthenMAX}$ as the finest refinement of the MiniMax principle that satisfies three properties we claim are necessary for games with type ambiguity. This prior-less approach we present her also follows the common practice in computer science of worst-case analysis. Finally, we define and analyze the corresponding equilibrium concept assuming all players follow ${\rm MINthenMAX}$. We demonstrate this equilibrium by applying it to two common economic scenarios: coordination games and bilateral trade. We show that in both scenarios, an equilibrium in pure strategies always exists and we analyze the equilibria.