approximate inference algorithm
AIDE: An algorithm for measuring the accuracy of probabilistic inference algorithms
Approximate probabilistic inference algorithms are central to many fields. Examples include sequential Monte Carlo inference in robotics, variational inference in machine learning, and Markov chain Monte Carlo inference in statistics. A key problem faced by practitioners is measuring the accuracy of an approximate inference algorithm on a specific data set. This paper introduces the auxiliary inference divergence estimator (AIDE), an algorithm for measuring the accuracy of approximate inference algorithms. AIDE is based on the observation that inference algorithms can be treated as probabilistic models and the random variables used within the inference algorithm can be viewed as auxiliary variables. This view leads to a new estimator for the symmetric KL divergence between the approximating distributions of two inference algorithms. The paper illustrates application of AIDE to algorithms for inference in regression, hidden Markov, and Dirichlet process mixture models. The experiments show that AIDE captures the qualitative behavior of a broad class of inference algorithms and can detect failure modes of inference algorithms that are missed by standard heuristics.
FactorGraphNeuralNetwork
Most of the successful deep neural network architectures are structured, often consisting of elements like convolutional neural networks and gated recurrent neural networks. Recently, graph neural networks (GNNs) have been successfully applied to graph-structureddata such as point cloud and molecular data. These networks often only consider pairwise dependencies, as they operate on a graph structure.
Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language
Deriving conditional and marginal distributions using conjugacy relationships can be time consuming and error prone. In this paper, we propose a strategy for automating such derivations. Unlike previous systems which focus on relationships between pairs of random variables, our system (which we call Autoconj) operates directly on Python functions that compute log-joint distribution functions. Autoconj provides support for conjugacy-exploiting algorithms in any Python-embedded PPL. This paves the way for accelerating development of novel inference algorithms and structure-exploiting modeling strategies.
Reviews: A Learning Error Analysis for Structured Prediction with Approximate Inference
This paper is on the important topic of learning with approximate inference. Previous work, e.g., Kulesza and Pereira (2007), has demonstrated the importance of matching parameter update rules and inference approximation methods. This paper presents a new update rule based on PAC Bayes bounds, which is fairly agnostic to the inference algorithm used -- it assumes a multiplicative error bound on model score and supports both over and under approximations. The example given in section 3.2 is a great illustration of how approximation error is more subtle than we might think it is. Sometimes an approximate predictor can fit the training data better because it represents a different family of functions!
Reviews: AIDE: An algorithm for measuring the accuracy of probabilistic inference algorithms
I do still have some concerns regarding applicability, since a gold standard "ground truth" of inference is required. I do appreciate the situation described in the feedback, where one is trying to decide on some approximate algorithm to "deploy" in the wild, is actually fairly common. That is, the sort of setting where very slow MCMC can be run for a long time on training data, but on new data, where there is e.g. a real-time requirement, a faster approximate inference algorithm will be used instead. The approach is based on constructing an estimator of the "symmetric" KL divergence (i.e., the sum of the forward and reverse KL) between an approximation to the target distribution and a representation of the "true" exact target distribution. The overall approach considered is interesting, and for the most part clearly presented.
A Learning Error Analysis for Structured Prediction with Approximate Inference
Yuanbin Wu, Man Lan, Shiliang Sun, Qi Zhang, Xuanjing Huang
In this work, we try to understand the differences between exact and approximate inference algorithms in structured prediction. We compare the estimation and approximation error of both underestimate (e.g., greedy search) and overestimate (e.g., linear relaxation of integer programming) models. The result shows that, from the perspective of learning errors, performances of approximate inference could be as good as exact inference. The error analyses also suggest a new margin for existing learning algorithms. Empirical evaluations on text classification, sequential labelling and dependency parsing witness the success of approximate inference and the benefit of the proposed margin.
On Triangular versus Edge Representations --- Towards Scalable Modeling of Networks
In this paper, we argue for representing networks as a bag of {\it triangular motifs}, particularly for important network problems that current model-based approaches handle poorly due to computational bottlenecks incurred by using edge representations. Such approaches require both 1-edges and 0-edges (missing edges) to be provided as input, and as a consequence, approximate inference algorithms for these models usually require \Omega(N 2) time per iteration, precluding their application to larger real-world networks. In contrast, triangular modeling requires less computation, while providing equivalent or better inference quality. A triangular motif is a vertex triple containing 2 or 3 edges, and the number of such motifs is \Theta(\sum_{i}D_{i} {2}) (where D_i is the degree of vertex i), which is much smaller than N 2 for low-maximum-degree networks. Using this representation, we develop a novel mixed-membership network model and approximate inference algorithm suitable for large networks with low max-degree.
Scalable Quasi-Bayesian Inference for Instrumental Variable Regression
Wang, Ziyu, Zhou, Yuhao, Ren, Tongzheng, Zhu, Jun
Recent years have witnessed an upsurge of interest in employing flexible machine learning models for instrumental variable (IV) regression, but the development of uncertainty quantification methodology is still lacking. In this work we present a scalable quasi-Bayesian procedure for IV regression, building upon the recently developed kernelized IV models. Contrary to Bayesian modeling for IV, our approach does not require additional assumptions on the data generating process, and leads to a scalable approximate inference algorithm with time cost comparable to the corresponding point estimation methods. Our algorithm can be further extended to work with neural network models. We analyze the theoretical properties of the proposed quasi-posterior, and demonstrate through empirical evaluation the competitive performance of our method.
On Triangular versus Edge Representations --- Towards Scalable Modeling of Networks
Ho, Qirong, Yin, Junming, Xing, Eric P.
In this paper, we argue for representing networks as a bag of {\it triangular motifs}, particularly for important network problems that current model-based approaches handle poorly due to computational bottlenecks incurred by using edge representations. Such approaches require both 1-edges and 0-edges (missing edges) to be provided as input, and as a consequence, approximate inference algorithms for these models usually require $\Omega(N 2)$ time per iteration, precluding their application to larger real-world networks. In contrast, triangular modeling requires less computation, while providing equivalent or better inference quality. A triangular motif is a vertex triple containing 2 or 3 edges, and the number of such motifs is $\Theta(\sum_{i}D_{i} {2})$ (where $D_i$ is the degree of vertex $i$), which is much smaller than $N 2$ for low-maximum-degree networks. Using this representation, we develop a novel mixed-membership network model and approximate inference algorithm suitable for large networks with low max-degree.