Uncertainty
Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data
Vehtari, Aki, Gelman, Andrew, Sivula, Tuomas, Jylรคnki, Pasi, Tran, Dustin, Sahai, Swupnil, Blomstedt, Paul, Cunningham, John P., Schiminovich, David, Robert, Christian
A common approach for Bayesian computation with big data is to partition the data into smaller pieces, perform local inference for each piece separately, and finally combine the results to obtain an approximation to the global posterior. Looking at this from the bottom up, one can perform separate analyses on individual sources of data and then combine these in a larger Bayesian model. In either case, the idea of distributed modeling and inference has both conceptual and computational appeal, but from the Bayesian perspective there is no general way of handling the prior distribution: if the prior is included in each separate inference, it will be multiply-counted when the inferences are combined; but if the prior is itself divided into pieces, it may not provide enough regularization for each separate computation, thus eliminating one of the key advantages of Bayesian methods. To resolve this dilemma, we propose expectation propagation (EP) as a general prototype for distributed Bayesian inference. The central idea is to factor the likelihood according to the data partitions, and to iteratively combine each factor with an approximate model of the prior and all other parts of the data, thus producing an overall approximation to the global posterior at convergence. In this paper, we give an introduction to EP and an overview of some recent developments of the method, with particular emphasis on its use in combining inferences from partitioned data. In addition to distributed modeling of large datasets, our unified treatment also includes hierarchical modeling of data with a naturally partitioned structure. The paper describes a general algorithmic framework, rather than a specific algorithm, and presents an example implementation for it.
Nonparametric Risk Assessment and Density Estimation for Persistence Landscapes
Pakniat, Soroush, Eskandari, Farzad
This paper presents approximate confidence intervals for each function of parameters in a Banach space based on a bootstrap algorithm. We apply kernel density approach to estimate the persistence landscape. In addition, we evaluate the quality distribution function estimator of random variables using integrated mean square error (IMSE). The results of simulation studies show a significant improvement achieved by our approach compared to the standard version of confidence intervals algorithm. In the next step, we provide several algorithms to solve our model. Finally, real data analysis shows that the accuracy of our method compared to that of previous works for computing the confidence interval.
News-based forecasts of macroeconomic indicators: A semantic path model for interpretable predictions
Feuerriegel, Stefan, Gordon, Julius
The macroeconomic climate influences operations with regard to, e.g., raw material prices, financing, supply chain utilization and demand quotas. In order to adapt to the economic environment, decision-makers across the public and private sectors require accurate forecasts of the economic outlook. Existing predictive frameworks base their forecasts primarily on time series analysis, as well as the judgments of experts. As a consequence, current approaches are often biased and prone to error. In order to reduce forecast errors, this paper presents an innovative methodology that extends lag variables with unstructured data in the form of financial news: (1) we apply a variety of models from machine learning to word counts as a high-dimensional input. However, this approach suffers from low interpretability and overfitting, motivating the following remedies. (2) We follow the intuition that the economic climate is driven by general sentiments and suggest a projection of words onto latent semantic structures as a means of feature engineering. (3) We propose a semantic path model, together with estimation technique based on regularization, in order to yield full interpretability of the forecasts. We demonstrate the predictive performance of our approach by utilizing 80,813 ad hoc announcements in order to make long-term forecasts of up to 24 months ahead regarding key macroeconomic indicators. Back-testing reveals a considerable reduction in forecast errors.
Patchwork Kriging for Large-scale Gaussian Process Regression
This paper presents a new approach for Gaussian process (GP) regression for large datasets. The approach involves partitioning the regression input domain into multiple local regions with a different local GP model fitted in each region. Unlike existing local partitioned GP approaches, we introduce a technique for patching together the local GP models nearly seamlessly to ensure that the local GP models for two neighboring regions produce nearly the same response prediction and prediction error variance on the boundary between the two regions. This effectively solves the well-known discontinuity problem that degrades the boundary accuracy of existing local partitioned GP methods. Our main innovation is to represent the continuity conditions as additional pseudo-observations that the differences between neighboring GP responses are identically zero at an appropriately chosen set of boundary input locations. To predict the response at any input location, we simply augment the actual response observations with the pseudo-observations and apply standard GP prediction methods to the augmented data. In contrast to heuristic continuity adjustments, this has an advantage of working within a formal GP framework, so that the GP-based predictive uncertainty quantification remains valid. Our approach also inherits a sparse block-like structure for the sample covariance matrix, which results in computationally efficient closed-form expressions for the predictive mean and variance. In addition, we provide a new spatial partitioning scheme based on a recursive space partitioning along local principal component directions, which makes the proposed approach applicable for regression domains having more than two dimensions. Using three spatial datasets and three higher dimensional datasets, we investigate the numerical performance of the approach and compare it to several state-of-the-art approaches.
Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors in High Dimensions
Chen, Victor, Dunlop, Matthew M., Papaspiliopoulos, Omiros, Stuart, Andrew M.
A key problem in inference for high dimensional unknowns is the design of sampling algorithms whose performance scales favourably with the dimension of the unknown. A typical setting in which these problems arise is the area of Bayesian inverse problems. In such problems, which include graph-based learning, nonparametric regression and PDE-based inversion, the unknown can be viewed as an infinite-dimensional parameter (such as a function) that has been discretised. This results in a high-dimensional space for inference. Here we study robustness of an MCMC algorithm for posterior inference; this refers to MCMC convergence rates that do not deteriorate as the discretisation becomes finer. When a Gaussian prior is employed there is a known methodology for the design of robust MCMC samplers. However, one often requires more flexibility than a Gaussian prior can provide: hierarchical models are used to enable inference of parameters underlying a Gaussian prior; or non-Gaussian priors, such as Besov, are employed to induce sparse MAP estimators; or deep Gaussian priors are used to represent other non-Gaussian phenomena; and piecewise constant functions, which are necessarily non-Gaussian, are required for classification problems. The purpose of this article is to show that the simulation technology available for Gaussian priors can be exported to such non-Gaussian priors. The underlying methodology is based on a white noise representation of the unknown. This is exploited both for robust posterior sampling and for joint inference of the function and parameters involved in the specification of its prior, in which case our framework borrows strength from the well-developed non-centred methodology for Bayesian hierarchical models. The desired robustness of the proposed sampling algorithms is supported by some theory and by extensive numerical evidence from several challenging problems.
A Bayesian and Machine Learning approach to estimating Influence Model parameters for IM-RO
The rise of Online Social Networks (OSNs) has caused an insurmountable amount of interest from advertisers and researchers seeking to monopolize on its features. Researchers aim to develop strategies for determining how information is propagated among users within an OSN that is captured by diffusion or influence models. We consider the influence models for the IM-RO problem, a novel formulation to the Influence Maximization (IM) problem based on implementing Stochastic Dynamic Programming (SDP). In contrast to existing approaches involving influence spread and the theory of submodular functions, the SDP method focuses on optimizing clicks and ultimately revenue to advertisers in OSNs. Existing approaches to influence maximization have been actively researched over the past decade, with applications to multiple fields, however, our approach is a more practical variant to the original IM problem. In this paper, we provide an analysis on the influence models of the IM-RO problem by conducting experiments on synthetic and real-world datasets. We propose a Bayesian and Machine Learning approach for estimating the parameters of the influence models for the (Influence Maximization- Revenue Optimization) IM-RO problem. We present a Bayesian hierarchical model and implement the well-known Naive Bayes classifier (NBC), Decision Trees classifier (DTC) and Random Forest classifier (RFC) on three real-world datasets. Compared to previous approaches to estimating influence model parameters, our strategy has the great advantage of being directly implementable in standard software packages such as WinBUGS/OpenBUGS/JAGS and Apache Spark. We demonstrate the efficiency and usability of our methods in terms of spreading information and generating revenue for advertisers in the context of OSNs.
User Modelling for Avoiding Overfitting in Interactive Knowledge Elicitation for Prediction
Daee, Pedram, Peltola, Tomi, Vehtari, Aki, Kaski, Samuel
In human-in-the-loop machine learning, the user provides information beyond that in the training data. Many algorithms and user interfaces have been designed to optimize and facilitate this human--machine interaction; however, fewer studies have addressed the potential defects the designs can cause. Effective interaction often requires exposing the user to the training data or its statistics. The design of the system is then critical, as this can lead to double use of data and overfitting, if the user reinforces noisy patterns in the data. We propose a user modelling methodology, by assuming simple rational behaviour, to correct the problem. We show, in a user study with 48 participants, that the method improves predictive performance in a sparse linear regression sentiment analysis task, where graded user knowledge on feature relevance is elicited. We believe that the key idea of inferring user knowledge with probabilistic user models has general applicability in guarding against overfitting and improving interactive machine learning.
Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition
Dorie, Vincent, Hill, Jennifer, Shalit, Uri, Scott, Marc, Cervone, Dan
Statisticians have made great progress in creating methods that reduce our reliance on parametric assumptions. However this explosion in research has resulted in a breadth of inferential strategies that both create opportunities for more reliable inference as well as complicate the choices that an applied researcher has to make and defend. Relatedly, researchers advocating for new methods typically compare their method to at best 2 or 3 other causal inference strategies and test using simulations that may or may not be designed to equally tease out flaws in all the competing methods. The causal inference data analysis challenge, "Is Your SATT Where It's At?", launched as part of the 2016 Atlantic Causal Inference Conference, sought to make progress with respect to both of these issues. The researchers creating the data testing grounds were distinct from the researchers submitting methods whose efficacy would be evaluated. Results from 30 competitors across the two versions of the competition (black box algorithms and do-it-yourself analyses) are presented along with post-hoc analyses that reveal information about the characteristics of causal inference strategies and settings that affect performance. The most consistent conclusion was that methods that flexibly model the response surface perform better overall than methods that fail to do so. Finally new methods are proposed that combine features of several of the top-performing submitted methods.
General Latent Feature Models for Heterogeneous Datasets
Valera, Isabel, Pradier, Melanie F., Lomeli, Maria, Ghahramani, Zoubin
Latent feature modeling allows capturing the latent structure responsible for generating the observed properties of a set of objects. It is often used to make predictions either for new values of interest or missing information in the original data, as well as to perform data exploratory analysis. However, although there is an extensive literature on latent feature models for homogeneous datasets, where all the attributes that describe each object are of the same (continuous or discrete) nature, there is a lack of work on latent feature modeling for heterogeneous databases. In this paper, we introduce a general Bayesian nonparametric latent feature model suitable for heterogeneous datasets, where the attributes describing each object can be either discrete, continuous or mixed variables. The proposed model presents several important properties. First, it accounts for heterogeneous data while keeping the properties of conjugate models, which allow us to infer the model in linear time with respect to the number of objects and attributes. Second, its Bayesian nonparametric nature allows us to automatically infer the model complexity from the data, i.e., the number of features necessary to capture the latent structure in the data. Third, the latent features in the model are binary-valued variables, easing the interpretability of the obtained latent features in data exploratory analysis. We show the flexibility of the proposed model by solving both prediction and data analysis tasks on several real-world datasets. Moreover, a software package of the GLFM is publicly available for other researcher to use and improve it.
Learning and Free Energies for Vector Approximate Message Passing
Fletcher, Alyson K., Schniter, Philip
Vector approximate message passing (VAMP) is a computationally simple approach to the recovery of a signal $\mathbf{x}$ from noisy linear measurements $\mathbf{y}=\mathbf{Ax}+\mathbf{w}$. Like the AMP proposed by Donoho, Maleki, and Montanari in 2009, VAMP is characterized by a rigorous state evolution (SE) that holds under certain large random matrices and that matches the replica prediction of optimality. But while AMP's SE holds only for large i.i.d. sub-Gaussian $\mathbf{A}$, VAMP's SE holds under the much larger class: right-rotationally invariant $\mathbf{A}$. To run VAMP, however, one must specify the statistical parameters of the signal and noise. This work combines VAMP with Expectation-Maximization to yield an algorithm, EM-VAMP, that can jointly recover $\mathbf{x}$ while learning those statistical parameters. The fixed points of the proposed EM-VAMP algorithm are shown to be stationary points of a certain constrained free-energy, providing a variational interpretation of the algorithm. Numerical simulations show that EM-VAMP is robust to highly ill-conditioned $\mathbf{A}$ with performance nearly matching oracle-parameter VAMP.