Uncertainty
Bayesian model selection consistency and oracle inequality with intractable marginal likelihood
In this article, we investigate large sample properties of model selection procedures in a general Bayesian framework when a closed form expression of the marginal likelihood function is not available or a local asymptotic quadratic approximation of the log-likelihood function does not exist. Under appropriate identifiability assumptions on the true model, we provide sufficient conditions for a Bayesian model selection procedure to be consistent and obey the Occam's razor phenomenon, i.e., the probability of selecting the "smallest" model that contains the truth tends to one as the sample size goes to infinity. In order to show that a Bayesian model selection procedure selects the smallest model containing the truth, we impose a prior anti-concentration condition, requiring the prior mass assigned by large models to a neighborhood of the truth to be sufficiently small. In a more general setting where the strong model identifiability assumption may not hold, we introduce the notion of local Bayesian complexity and develop oracle inequalities for Bayesian model selection procedures. Our Bayesian oracle inequality characterizes a trade-off between the approximation error and a Bayesian characterization of the local complexity of the model, illustrating the adaptive nature of averaging-based Bayesian procedures towards achieving an optimal rate of posterior convergence. Specific applications of the model selection theory are discussed in the context of high-dimensional nonparametric regression and density regression where the regression function or the conditional density is assumed to depend on a fixed subset of predictors. As a result of independent interest, we propose a general technique for obtaining upper bounds of certain small ball probability of stationary Gaussian processes.
Information Pursuit: A Bayesian Framework for Sequential Scene Parsing
Jahangiri, Ehsan, Yoruk, Erdem, Vidal, Rene, Younes, Laurent, Geman, Donald
Despite enormous progress in object detection and classification, the problem of incorporating expected contextual relationships among object instances into modern recognition systems remains a key challenge. In this work we propose Information Pursuit, a Bayesian framework for scene parsing that combines prior models for the geometry of the scene and the spatial arrangement of objects instances with a data model for the output of high-level image classifiers trained to answer specific questions about the scene. In the proposed framework, the scene interpretation is progressively refined as evidence accumulates from the answers to a sequence of questions. At each step, we choose the question to maximize the mutual information between the new answer and the full interpretation given the current evidence obtained from previous inquiries. We also propose a method for learning the parameters of the model from synthesized, annotated scenes obtained by top-down sampling from an easy-to-learn generative scene model. Finally, we introduce a database of annotated indoor scenes of dining room tables, which we use to evaluate the proposed approach.
Variational Bayesian Inference of Line Spectra
Badiu, Mihai-Alin, Hansen, Thomas Lundgaard, Fleury, Bernard Henri
In this paper, we address the fundamental problem of line spectral estimation in a Bayesian framework. We target model order and parameter estimation via variational inference in a probabilistic model in which the frequencies are continuous-valued, i.e., not restricted to a grid; and the coefficients are governed by a Bernoulli-Gaussian prior model turning model order selection into binary sequence detection. Unlike earlier works which retain only point estimates of the frequencies, we undertake a more complete Bayesian treatment by estimating the posterior probability density functions (pdfs) of the frequencies and computing expectations over them. Thus, we additionally capture and operate with the uncertainty of the frequency estimates. Aiming to maximize the model evidence, variational optimization provides analytic approximations of the posterior pdfs and also gives estimates of the additional parameters. We propose an accurate representation of the pdfs of the frequencies by mixtures of von Mises pdfs, which yields closed-form expectations. We define the algorithm VALSE in which the estimates of the pdfs and parameters are iteratively updated. VALSE is a gridless, convergent method, does not require parameter tuning, can easily include prior knowledge about the frequencies and provides approximate posterior pdfs based on which the uncertainty in line spectral estimation can be quantified. Simulation results show that accounting for the uncertainty of frequency estimates, rather than computing just point estimates, significantly improves the performance. The performance of VALSE is superior to that of state-of-the-art methods and closely approaches the Cram\'er-Rao bound computed for the true model order.
Coupled Compound Poisson Factorization
Basbug, Mehmet E., Engelhardt, Barbara E.
We present a general framework, the coupled compound Poisson factorization (CCPF), to capture the missing-data mechanism in extremely sparse data sets by coupling a hierarchical Poisson factorization with an arbitrary data-generating model. We derive a stochastic variational inference algorithm for the resulting model and, as examples of our framework, implement three different data-generating models---a mixture model, linear regression, and factor analysis---to robustly model non-random missing data in the context of clustering, prediction, and matrix factorization. In all three cases, we test our framework against models that ignore the missing-data mechanism on large scale studies with non-random missing data, and we show that explicitly modeling the missing-data mechanism substantially improves the quality of the results, as measured using data log likelihood on a held-out test set.
Graph Structure Learning from Unlabeled Data for Event Detection
Somanchi, Sriram, Neill, Daniel B.
Processes such as disease propagation and information diffusion often spread over some latent network structure which must be learned from observation. Given a set of unlabeled training examples representing occurrences of an event type of interest (e.g., a disease outbreak), our goal is to learn a graph structure that can be used to accurately detect future events of that type. Motivated by new theoretical results on the consistency of constrained and unconstrained subset scans, we propose a novel framework for learning graph structure from unlabeled data by comparing the most anomalous subsets detected with and without the graph constraints. Our framework uses the mean normalized log-likelihood ratio score to measure the quality of a graph structure, and efficiently searches for the highest-scoring graph structure. Using simulated disease outbreaks injected into real-world Emergency Department data from Allegheny County, we show that our method learns a structure similar to the true underlying graph, but enables faster and more accurate detection.
Gaussian Process Quadrature Moment Transform
Prรผher, Jakub, Straka, Ondลej
Computation of moments of transformed random variables is a problem appearing in many engineering applications. The current methods for moment transformation are mostly based on the classical quadrature rules which cannot account for the approximation errors. Our aim is to design a method for moment transformation for Gaussian random variables which accounts for the error in the numerically computed mean. We employ an instance of Bayesian quadrature, called Gaussian process quadrature (GPQ), which allows us to treat the integral itself as a random variable, where the integral variance informs about the incurred integration error. Experiments on the coordinate transformation and nonlinear filtering examples show that the proposed GPQ moment transform performs better than the classical transforms.
Probabilistic Multigraph Modeling for Improving the Quality of Crowdsourced Affective Data
Ye, Jianbo, Li, Jia, Newman, Michelle G., Adams, Reginald B. Jr., Wang, James Z.
We proposed a probabilistic approach to joint modeling of participants' reliability and humans' regularity in crowdsourced affective studies. Reliability measures how likely a subject will respond to a question seriously; and regularity measures how often a human will agree with other seriously-entered responses coming from a targeted population. Crowdsourcing-based studies or experiments, which rely on human self-reported affect, pose additional challenges as compared with typical crowdsourcing studies that attempt to acquire concrete non-affective labels of objects. The reliability of participants has been massively pursued for typical non-affective crowdsourcing studies, whereas the regularity of humans in an affective experiment in its own right has not been thoroughly considered. It has been often observed that different individuals exhibit different feelings on the same test question, which does not have a sole correct response in the first place. High reliability of responses from one individual thus cannot conclusively result in high consensus across individuals. Instead, globally testing consensus of a population is of interest to investigators. Built upon the agreement multigraph among tasks and workers, our probabilistic model differentiates subject regularity from population reliability. We demonstrate the method's effectiveness for in-depth robust analysis of large-scale crowdsourced affective data, including emotion and aesthetic assessments collected by presenting visual stimuli to human subjects.
An Interval-Based Bayesian Generative Model for Human Complex Activity Recognition
Liu, Li, Yang, Yongzhong, Govindarajan, Lakshmi Narasimhan, Wang, Shu, Hu, Bin, Cheng, Li, Rosenblum, David S.
A complex activity consists of a set of temporally-composed events of atomic actions, which are the lowest-level events that can be directly detected from sensors. In other words, a complex activity is usually composed of multiple atomic actions occurring consecutively and concurrently over a duration of time. Modeling and recognizing complex activities remains an open research question as it faces several challenges: First, understanding complex activities calls for not only the inference of atomic actions, but also the interpretation of their rich temporal dependencies. Second, individuals often possess diverse styles of performing the same complex activity. As a result, a complex activity recognition model should be capable of capturing and propagating the underlying uncertainties over atomic actions and their temporal relationships. Third, a complex activity recognition model should also tolerate errors introduced from atomic action level, due to sensor noise or low-level prediction errors. A. Related Work Currently, a lot of research focuses on semantic-based complex activity modeling. Many semantic-based models such as context-free grammar (CFG) [26] and Markov logic network (MLN) [11], [18]) are used to represent complex activities, which can handle rich temporal relations.
Semidefinite tests for latent causal structures
Kela, Aditya, von Prillwitz, Kai, Aberg, Johan, Chaves, Rafael, Gross, David
In spite of the primal importance of discovering causal relations in science, the statistical analysis of empirical data has historically shied away from causality . Only releatively recently has a rigorous theory of causality emerged (see, for instance, [ 1, 2 ]), showing that empirical data indeed can contain information about causation rather than mere correlation. Since then, causal inference has quickly become influential. Examples range from applications to the inference of genetic [ 3] and social networks [ 4], to a better understanding of the role of causality within quantum physics [ 5-13]. T o formalize causal mechanisms it has become popular to use directed acyclic graphs (DAGs) where nodes denote random variables and directed edges (arrows) account for their causal relations. Central problems within this context include inferenceor model selection: 'Given samples from a number of observable variables, which DAG should we associate with them?', as well as hypothesis testing: 'Can the observed data be explained in terms of an assumed DAG?' Here, we concentrate on the latter problem and propose a novel solution based on the covariances that a given causal structure gives rise to.
Probabilistic Feature Selection and Classification Vector Machine
Jiang, Bingbing, Li, Chang, Chen, Huanhuan, Yao, Xin, de Rijke, Maarten
Sparse Bayesian learning is one of the state-of- the-art machine learning algorithms, which is able to make stable and reliable probabilistic predictions. However, some of these algorithms, e.g. probabilistic classification vector machine (PCVM) and relevant vector machine (RVM), are not capable of eliminating irrelevant and redundant features which could lead to performance degradation. To tackle this problem, in this paper, we propose a sparse Bayesian classifier which simultaneously selects the relevant samples and features. We name this classifier a probabilistic feature selection and classification vector machine (PFCVM), in which truncated Gaussian distributions are em- ployed as both sample and feature priors. In order to derive the analytical solution for the proposed algorithm, we use Laplace approximation to calculate approximate posteriors and marginal likelihoods. Finally, we obtain the optimized parameters and hyperparameters by the type-II maximum likelihood method. The experiments on synthetic data set, benchmark data sets and high dimensional data sets validate the performance of PFCVM under two criteria: accuracy of classification and efficacy of selected features. Finally, we analyze the generalization performance of PFCVM and derive a generalization error bound for PFCVM. Then by tightening the bound, we demonstrate the significance of the sparseness for the model.