Uncertainty
Hierarchical Multinomial-Dirichlet model for the estimation of conditional probability tables
Azzimonti, L., Corani, G., Zaffalon, M.
Abstract--We present a novel approach for estimating conditional probability tables, based on a joint, rather than independent, estimate of the conditional distributions belonging to the same table. We derive exact analytical expressions for the estimators and we analyse their properties both analytically and via simulation. We then apply this method to the estimation of parameters in a Bayesian network. Given the structure of the network, the proposed approach better estimates the joint distribution and significantly improves the classification performance with respect to traditional approaches. I. INTRODUCTION A Bayesian network is a probabilistic model constituted by a directed acyclic graph (DAG) and a set of conditional probability tables (CPTs), one for each node. The CPT of node X contains the conditional probability distributions of X given each possible configuration of its parents. Usually all variables are discrete and the conditional distributions are estimated adopting a Multinomial-Dirichlet model, where the Dirichlet prior is characterised by the vector of hyper-parameters α . Y et, Bayesian estimation of multinomials is sensitive to the choice of α and inappropriate values cause the estimator to perform poorly [1].
Exchangeable Random Measures for Sparse and Modular Graphs with Overlapping Communities
Todeschini, Adrien, Miscouridou, Xenia, Caron, François
A network is composed of a set of nodes, or vertices, with connections between them. Network data arise in a wide range of fields, and include social networks, collaboration networks, communication networks, biological networks, food webs and are a useful way of representing interactions between sets of objects. Of particular importance is the elaboration of random graph models, which can capture the salient properties of real-world graphs. Following the seminal work of Erd os and R enyi (1959), various network models have been proposed; see the overviews of Newman (2003b, 2009), Kolaczyk (2009), Bollob as (2001), Goldenberg et al. (2010), Fienberg (2012) or Jacobs and Clauset (2014). In particular, a large body of the literature has concentrated on models that can capture some modular or community structure within the network. The first statistical network model in this line of research is the popular stochastic block-model (Holland et al., 1983; Snijders and Nowicki, 1997; Nowicki and Snijders, 2001). The stochastic block-model assumes that each node belongs to one ofp latent communities, and the probability of connection between two nodes is given by ap p connectivity matrix. This model has been extended in various directions, by introducing degree-correction parameters (Karrer and Newman, 2011), by allowing the number of communities to grow with the size of the network (Kemp et al., 2006), or by considering overlapping communities (Airoldi et al., 2008; Miller et al., 2009; Latouche et al., 2011; Palla et al., 2012; Yang and Leskovec, 2013). Stochastic block-models and their extensions have shown to offer a very flexible modeling framework, with interpretable parameters, and have been successfully used for the analysis of numerous real-world networks.
Flexible Low-Rank Statistical Modeling with Side Information
Fithian, William, Mazumder, Rahul
We propose a general framework for reduced-rank modeling of matrix-valued data. By applying a generalized nuclear norm penalty we can directly model low-dimensional latent variables associated with rows and columns. Our framework flexibly incorporates row and column features, smoothing kernels, and other sources of side information by penalizing deviations from the row and column models. Moreover, a large class of these models can be estimated scalably using convex optimization. The computational bottleneck in each case is one singular value decomposition per iteration of a large but easy-to-apply matrix. Our framework generalizes traditional convex matrix completion and multi-task learning methods as well as maximum a posteriori estimation under a large class of popular hierarchical Bayesian models.
Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations
Raissi, Maziar, Karniadakis, George Em
While there is currently a lot of enthusiasm about "big data", useful data is usually "small" and expensive to acquire. In this paper, we present a new paradigm of learning partial differential equations from {\em small} data. In particular, we introduce \emph{hidden physics models}, which are essentially data-efficient learning machines capable of leveraging the underlying laws of physics, expressed by time dependent and nonlinear partial differential equations, to extract patterns from high-dimensional data generated from experiments. The proposed methodology may be applied to the problem of learning, system identification, or data-driven discovery of partial differential equations. Our framework relies on Gaussian processes, a powerful tool for probabilistic inference over functions, that enables us to strike a balance between model complexity and data fitting. The effectiveness of the proposed approach is demonstrated through a variety of canonical problems, spanning a number of scientific domains, including the Navier-Stokes, Schr\"odinger, Kuramoto-Sivashinsky, and time dependent linear fractional equations. The methodology provides a promising new direction for harnessing the long-standing developments of classical methods in applied mathematics and mathematical physics to design learning machines with the ability to operate in complex domains without requiring large quantities of data.
Likelihood-free inference by ratio estimation
Dutta, Ritabrata, Corander, Jukka, Kaski, Samuel, Gutmann, Michael U.
We consider the problem of parametric statistical inference when likelihood computations are prohibitively expensive but sampling from the model is possible. Several so-called likelihood-free methods have been developed to perform inference in the absence of a likelihood function. The popular synthetic likelihood approach infers the parameters by modelling summary statistics of the data by a Gaussian probability distribution. In another popular approach called approximate Bayesian computation, the inference is performed by identifying parameter values for which the summary statistics of the simulated data are close to those of the observed data. Synthetic likelihood is easier to use as no measure of "closeness" is required but the Gaussianity assumption is often limiting. Moreover, both approaches require judiciously chosen summary statistics. We here present an alternative inference approach that is as easy to use as synthetic likelihood but not as restricted in its assumptions, and that, in a natural way, enables automatic selection of relevant summary statistic from a large set of candidates. The basic idea is to frame the problem of estimating the posterior as a problem of estimating the ratio between the data generating distribution and the marginal distribution. This problem can be solved by logistic regression, and including regularising penalty terms enables automatic selection of the summary statistics relevant to the inference task. We illustrate the general theory on toy problems and use it to perform inference for stochastic nonlinear dynamical systems.
Sum-Product Graphical Models
Desana, Mattia, Schnörr, Christoph
This paper introduces a new probabilistic architecture called Sum-Product Graphical Model (SPGM). SPGMs combine traits from Sum-Product Networks (SPNs) and Graphical Models (GMs): Like SPNs, SPGMs always enable tractable inference using a class of models that incorporate context specific independence. Like GMs, SPGMs provide a high-level model interpretation in terms of conditional independence assumptions and corresponding factorizations. Thus, the new architecture represents a class of probability distributions that combines, for the first time, the semantics of graphical models with the evaluation efficiency of SPNs. We also propose a novel algorithm for learning both the structure and the parameters of SPGMs. A comparative empirical evaluation demonstrates competitive performances of our approach in density estimation.
Bayesian Network Learning via Topological Order
Park, Young Woong, Klabjan, Diego
We propose a mixed integer programming (MIP) model and iterative algorithms based on topological orders to solve optimization problems with acyclic constraints on a directed graph. The proposed MIP model has a significantly lower number of constraints compared to popular MIP models based on cycle elimination constraints and triangular inequalities. The proposed iterative algorithms use gradient descent and iterative reordering approaches, respectively, for searching topological orders. A computational experiment is presented for the Gaussian Bayesian network learning problem, an optimization problem minimizing the sum of squared errors of regression models with L1 penalty over a feature network with application of gene network inference in bioinformatics.
Probabilistic Reasoning with Abstract Argumentation Frameworks
Hunter, Anthony, Thimm, Matthias
Abstract argumentation offers an appealing way of representing and evaluating arguments and counterarguments. This approach can be enhanced by considering probability assignments on arguments, allowing for a quantitative treatment of formal argumentation. In this paper, we regard the assignment as denoting the degree of belief that an agent has in an argument being acceptable. While there are various interpretations of this, an example is how it could be applied to a deductive argument. Here, the degree of belief that an agent has in an argument being acceptable is a combination of the degree to which it believes the premises, the claim, and the derivation of the claim from the premises. We consider constraints on these probability assignments, inspired by crisp notions from classical abstract argumentation frameworks and discuss the issue of probabilistic reasoning with abstract argumentation frameworks. Moreover, we consider the scenario when assessments on the probabilities of a subset of the arguments are given and the probabilities of the remaining arguments have to be derived, taking both the topology of the argumentation framework and principles of probabilistic reasoning into account. We generalise this scenario by also considering inconsistent assessments, i.e., assessments that contradict the topology of the argumentation framework. Building on approaches to inconsistency measurement, we present a general framework to measure the amount of conflict of these assessments and provide a method for inconsistency-tolerant reasoning.
AI – The Present in the Making
I attended the Huawei European Innovation Day recently, and was enthralled by how the new technology is giving rise to industrial revolutions. These revolutions are what will eventually unlock the development potential around the world. It is important to leverage the emerging technologies, since they are the resources which will lead us to innovation and progress. Huawei is innovative in its partnerships and collaboration to define the future, and the event was a huge success. For many people, the concept of Artificial Intelligence (AI) is a thing of the future. It is the technology that has yet to be introduced.
Statistical Latent Space Approach for Mixed Data Modelling and Applications
Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh, Venkatesh, Svetha
The analysis of mixed data has been raising challenges in statistics and machine learning. One of two most prominent challenges is to develop new statistical techniques and methodologies to effectively handle mixed data by making the data less heterogeneous with minimum loss of information. The other challenge is that such methods must be able to apply in large-scale tasks when dealing with huge amount of mixed data. To tackle these challenges, we introduce parameter sharing and balancing extensions to our recent model, the mixed-variate restricted Boltzmann machine (MV.RBM) which can transform heterogeneous data into homogeneous representation. We also integrate structured sparsity and distance metric learning into RBM-based models. Our proposed methods are applied in various applications including latent patient profile modelling in medical data analysis and representation learning for image retrieval. The experimental results demonstrate the models perform better than baseline methods in medical data and outperform state-of-the-art rivals in image dataset.