Goto

Collaborating Authors

 Directed Networks


Calculating conditional probability in Bernoulli mixture model

#artificialintelligence

I'm following the book Pattern recognition and machine learning by Bishop on Bernoulli mixture model, and trying to code it. But I don't understand how to calculate the conditional probability (page 446 of the first edition) So in the E-step I'm supposed to calculate this. But it is said that we should use the log of the probability, so as to avoid numerical underflow. So how do i apply it here? I can't see any way to do it.


Robust subsampling-based sparse Bayesian inference to tackle four challenges (large noise, outliers, data integration, and extrapolation) in the discovery of physical laws from data

arXiv.org Machine Learning

The derivation of physical laws is a dominant topic in scientific research. We propose a new method capable of discovering the physical laws from data to tackle four challenges in the previous methods. The four challenges are: (1) large noise in the data, (2) outliers in the data, (3) integrating the data collected from different experiments, and (4) extrapolating the solutions to the areas that have no available data. To resolve these four challenges, we try to discover the governing differential equations and develop a model-discovering method based on sparse Bayesian inference and subsampling. The subsampling technique is used for improving the accuracy of the Bayesian learning algorithm here, while it is usually employed for estimating statistics or speeding up algorithms elsewhere. The optimal subsampling size is moderate, neither too small nor too big. Another merit of our method is that it can work with limited data by the virtue of Bayesian inference. We demonstrate how to use our method to tackle the four aforementioned challenges step by step through numerical examples: (1) predator-prey model with noise, (2) shallow water equations with outliers, (3) heat diffusion with random initial and boundary conditions, and (4) fish-harvesting problem with bifurcations. Numerical results show that the robustness and accuracy of our new method is significantly better than the other model-discovering methods and traditional regression methods.


Coupling material and mechanical design processes via computer model calibration

arXiv.org Machine Learning

Real-world optimization problems typically involve multiple objectives. This is particularly true in the design of engineering systems, where multiple performance outcomes are balanced against budgetary constraints. Among the complexities of optimizing over multiple objectives is the effect of uncertainties in the problem. Design is guided by models known to be imperfect, systems are built using materials with uncertainty regarding their properties, variations occur in the construction of designed systems, and so on. These imperfections, uncertainties and errors cause uncertainty also in the solution to a design problem. In traditional engineering design, one designs a system after choosing a material with appropriate properties for the project from a database of known materials. As a result, the design of the system is constrained by the initial material selection. By coupling material discovery and engineering system design, we can combine these two traditionally separate processes under the umbrella of a unified multiple objective optimization problem. In this paper, we cast the engineering design problem in the framework of computer model calibration.


Recursion, Probability, Convolution and Classification for Computations

arXiv.org Artificial Intelligence

The main motivation of this work was practical, to offer computationally and theoretical scalable ways to structuring large classes of computation. It started from attempts to optimize R code for machine learning/artificial intelligence algorithms for huge data sets, that due to their size, should be handled into an incremental (online) fashion. Our target are large classes of relational (attribute based), mathematical (index based) or graph computations. We wanted to use powerful computation representations that emerged in AI (artificial intelligence)/ML (machine learning) as BN (Bayesian networks) and CNN (convolution neural networks). For the classes of computation addressed by us, and for our HPC (high performance computing) needs, the current solutions for translating computations into such representation need to be extended. Our results show that the classes of computation targeted by us, could be tree-structured, and a probability distribution (defining a DBN, i.e. Dynamic Bayesian Network) associated with it. More ever, this DBN may be viewed as a recursive CNN (Convolution Neural Network). Within this tree-like structure, classification in classes with size bounded (by a parameterizable may be performed. These results are at the core of very powerful, yet highly practically algorithms for restructuring and parallelizing the computations. The mathematical background required for an in depth presentation and exposing the full generality of our approach) is the subject of a subsequent paper. In this paper, we work in an limited (but important) framework that could be understood with rudiments of linear algebra and graph theory. The focus is in applicability, most of this paper discuss the usefulness of our approach for solving hard compilation problems related to automatic parallelism.


Bayesian Inference with Generative Adversarial Network Priors

arXiv.org Machine Learning

Bayesian inference is used extensively to infer and to quantify the uncertainty in a field of interest from a measurement of a related field when the two are linked by a physical model. Despite its many applications, Bayesian inference faces challenges when inferring fields that have discrete representations of large dimension, and/or have prior distributions that are difficult to represent mathematically. In this manuscript we consider the use of Generative Adversarial Networks (GANs) in addressing these challenges. A GAN is a type of deep neural network equipped with the ability to learn the distribution implied by multiple samples of a given field. Once trained on these samples, the generator component of a GAN maps the iid components of a low-dimensional latent vector to an approximation of the distribution of the field of interest. In this work we demonstrate how this approximate distribution may be used as a prior in a Bayesian update, and how it addresses the challenges associated with characterizing complex prior distributions and the large dimension of the inferred field. We demonstrate the efficacy of this approach by applying it to the problem of inferring and quantifying uncertainty in the initial temperature field in a heat conduction problem from a noisy measurement of the temperature at later time.


Classification with the matrix-variate-$t$ distribution

arXiv.org Machine Learning

Matrix-variate distributions can intuitively model the dependence structure of matrix-valued observations that arise in applications with multivariate time series, spatio-temporal or repeated measures. This paper develops an Expectation-Maximization algorithm for discriminant analysis and classification with matrix-variate $t$-distributions. The methodology shows promise on simulated datasets or when applied to the forensic matching of fractured surfaces or the classification of functional Magnetic Resonance, satellite or hand gestures images.


A Sufficient Statistic for Influence in Structured Multiagent Environments

arXiv.org Artificial Intelligence

Making decisions in complex environments is a key challenge in artificial intelligence (AI). Situations involving multiple decision makers are particularly complex, leading to computation intractability of principled solution methods. A body of work in AI [4, 3, 41, 45, 47, 2] has tried to mitigate this problem by trying to bring down interaction to its core: how does the policy of one agent influence another agent? If we can find more compact representations of such influence, this can help us deal with the complexity, for instance by searching the space of influences rather than that of policies [45]. However, so far these notions of influence have been restricted in their applicability to special cases of interaction. In this paper we formalize influence-based abstraction (IBA), which facilitates the elimination of latent state factors without any loss in value, for a very general class of problems described as factored partially observable stochastic games (fPOSGs) [33]. This generalizes existing descriptions of influence, and thus can serve as the foundation for improvements in scalability and other insights in decision making in complex settings.


Improving Neural Network Classifier using Gradient-based Floating Centroid Method

arXiv.org Artificial Intelligence

Floating centroid method (FCM) offers an efficient way to solve a fixed-centroid problem for the neural network classifiers. However, evolutionary computation as its optimization method restrains the FCM to achieve satisfactory performance for different neural network structures, because of the high computational complexity and inefficiency. Traditional gradient-based methods have been extensively adopted to optimize the neural network classifiers. In this study, a gradient-based floating centroid (GDFC) method is introduced to address the fixed centroid problem for the neural network classifiers optimized by gradient-based methods. Furthermore, a new loss function for optimizing GDFC is introduced. The experimental results display that GDFC obtains promising classification performance than the comparison methods on the benchmark datasets.


Learning Probabilities: Towards a Logic of Statistical Learning

arXiv.org Artificial Intelligence

We propose a new model for forming beliefs and learning about unknown probabilities (such as the probability of picking a red marble from a bag with an unknown distribution of coloured marbles). The most widespread model for such situations of 'radical uncertainty' is in terms of imprecise probabilities, i.e. representing the agent's knowledge as a set of probability measures. We add to this model a plausibility map, associating to each measure a plausibility number, as a way to go beyond what is known with certainty and represent the agent's beliefs about probability. There are a number of standard examples: Shannon Entropy, Centre of Mass etc. We then consider learning of two types of information: (1) learning by repeated sampling from the unknown distribution (e.g. picking marbles from the bag); and (2) learning higher-order information about the distribution (in the shape of linear inequalities, e.g. we are told there are more red marbles than green marbles). The first changes only the plausibility map (via a 'plausibilistic' version of Bayes' Rule), but leaves the given set of measures unchanged; the second shrinks the set of measures, without changing their plausibility. Beliefs are defined as in Belief Revision Theory, in terms of truth in the most plausible worlds. But our belief change does not comply with standard AGM axioms, since the revision induced by (1) is of a non-AGM type. This is essential, as it allows our agents to learn the true probability: we prove that the beliefs obtained by repeated sampling converge almost surely to the correct belief (in the true probability). We end by sketching the contours of a dynamic doxastic logic for statistical learning.


Some New Results for Poisson Binomial Models

arXiv.org Machine Learning

We consider a problem of ecological inference, in which individual-level covariates are known, but labeled data is available only at the aggregate level. The intended application is modeling voter preferences in elections. In Rosenman and Viswanathan (2018), we proposed modeling individual voter probabilities via a logistic regression, and posing the problem as a maximum likelihood estimation for the parameter vector beta. The likelihood is a Poisson binomial, the distribution of the sum of independent but not identically distributed Bernoulli variables, though we approximate it with a heteroscedastic Gaussian for computational efficiency. Here, we extend the prior work by proving results about the existence of the MLE and the curvature of this likelihood, which is not log-concave in general. We further demonstrate the utility of our method on a real data example. Using data on voters in Morris County, NJ, we demonstrate that our approach outperforms other ecological inference methods in predicting a related, but known outcome: whether an individual votes.