Pitkow, Xaq, Ahmadian, Yashar, Miller, Ken D.

Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals `unbelievable.' This problem occurs whenever the Hessian of the Bethe free energy is not positive-definite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals.

It is known that fixed points of loopy belief propagation (BP) correspond to stationary points of the Bethe variational problem, where we minimize the Bethe free energy subject to normalization and marginalization constraints. Unfortunately, this does not entirely explain BP because BP is a dual rather than primal algorithm to solve the Bethe variational problem -- beliefs are infeasible before convergence. Thus, we have no better understanding of BP than as an algorithm to seek for a common zero of a system of non-linear functions, not explicitly related to each other. In this theoretical paper, we show that these functions are in fact explicitly related -- they are the partial derivatives of a single function of reparameterizations. That means, BP seeks for a stationary point of a single function, without any constraints. This function has a very natural form: it is a linear combination of local log-partition functions, exactly as the Bethe entropy is the same linear combination of local entropies.

Meshi, Ofer, Jaimovich, Ariel, Globerson, Amir, Friedman, Nir

The introduction of loopy belief propagation (LBP) revitalized the application of graphical models in many domains. Many recent works present improvements on the basic LBP algorithm in an attempt to overcome convergence and local optima problems. Notable among these are convexified free energy approximations that lead to inference procedures with provable convergence and quality properties. However, empirically LBP still outperforms most of its convex variants in a variety of settings, as we also demonstrate here. Motivated by this fact we seek convexified free energies that directly approximate the Bethe free energy. We show that the proposed approximations compare favorably with state-of-the art convex free energy approximations.

Pitkow, Zachary, Ahmadian, Yashar, Miller, Ken D.

In this paper we treat both forms of probabilistic inference, estimating marginal probabilities of the joint distribution and finding the most probable assignment, through a unified message-passing algorithm architecture. We generalize the Belief Propagation (BP) algorithms of sum-product and max-product and tree-rewaighted (TRW) sum and max product algorithms (TRBP) and introduce a new set of convergent algorithms based on "convex-free-energy" and Linear-Programming (LP) relaxation as a zero-temprature of a convex-free-energy. The main idea of this work arises from taking a general perspective on the existing BP and TRBP algorithms while observing that they all are reductions from the basic optimization formula of $f + \sum_i h_i$ where the function $f$ is an extended-valued, strictly convex but non-smooth and the functions $h_i$ are extended-valued functions (not necessarily convex). We use tools from convex duality to present the "primal-dual ascent" algorithm which is an extension of the Bregman successive projection scheme and is designed to handle optimization of the general type $f + \sum_i h_i$. Mapping the fractional-free-energy variational principle to this framework introduces the "norm-product" message-passing. Special cases include sum-product and max-product (BP algorithms) and the TRBP algorithms. When the fractional-free-energy is set to be convex (convex-free-energy) the norm-product is globally convergent for estimating of marginal probabilities and for approximating the LP-relaxation. We also introduce another branch of the norm-product, the "convex-max-product". The convex-max-product is convergent (unlike max-product) and aims at solving the LP-relaxation.