Goto

Collaborating Authors

 Bayesian Inference


Constructing a Chain Event Graph from a Staged Tree

arXiv.org Machine Learning

Chain Event Graphs (CEGs) are a recent family of probabilistic graphical models - a generalisation of Bayesian Networks - providing an explicit representation of structural zeros and context-specific conditional independences within their graph topology. A CEG is constructed from an event tree through a sequence of transformations beginning with the colouring of the vertices of the event tree to identify one-step transition symmetries. This coloured event tree, also known as a staged tree, is the output of the learning algorithms used for this family. Surprisingly, no general algorithm has yet been devised that automatically transforms any staged tree into a CEG representation. In this paper we provide a simple iterative backward algorithm for this transformation. Additionally, we show that no information is lost from transforming a staged tree into a CEG. Finally, we demonstrate that with an optimal stopping time, our algorithm is more efficient than the generalisation of a special case presented in Silander and Leong (2013). We also provide Python code using this algorithm to obtain a CEG from any staged tree along with the functionality to add edges with sampling zeros.


Statistical inference of assortative community structures

arXiv.org Machine Learning

These approaches, however, concept (for which there are many). Historically, most are based on general mixing patterns, which include community detection methods proposed have focused on assortativity only as a special case. In many ways this the detection of assortative communities, i.e. groups of is useful, and in fact arguably superior, since if assortativity nodes that tend to be more connected to themselves than happens to be the dominating pattern, then the to other nodes in the network. However, there are also general approach will capture it, otherwise it will reveal a community detection methods that are more general, and different structure. However, having only a more general attempt to cluster together nodes that have similar patterns method at our disposal also has its shortcomings. First, of connection, regardless if they are assortative or if it is true that assortativity is the main pattern for a not [3-5]. The widespread use of assortative community class of networks, then the more general representation detection methods has lead to the belief that the presence is needlessly wasteful for them, since it not only gives us of communities is a pervasive feature of many different more than we need, but in doing so it prevents us from kinds of real networks [6]. Although the concept of assortativity focusing on the more central features, at the cost of algorithmic is a central one in the study of social networks precision. Second, with a more general method (known as "homophily" in that context) [7], and is also an it can be difficult to quantify precisely how much has appealing construct in biology [8-10], it is to some extent been wasted in the representation, and what is indeed unclear if the perceived assortativity of many networks the simpler pattern hiding inside it.


Optimal Thinning of MCMC Output

arXiv.org Machine Learning

The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced. Typically a number of the initial states are attributed to "burn in" and removed, whilst the remainder of the chain is "thinned" if compression is also required. In this paper we consider the problem of retrospectively selecting a subset of states, of fixed cardinality, from the sample path such that the approximation provided by their empirical distribution is close to optimal. A novel method is proposed, based on greedy minimisation of a kernel Stein discrepancy, that is suitable for problems where heavy compression is required. Theoretical results guarantee consistency of the method and its effectiveness is demonstrated in the challenging context of parameter inference for ordinary differential equations. Software is available in the Stein Thinning package in both Python and MATLAB.


Generative Adversarial Networks (GANs) & Bayesian Networks

#artificialintelligence

Generative Adversarial Networks (GANs) software is software for producing forgeries and imitations of data (aka synthetic data, fake data). Human beings have been making fakes, with good or evil intent, of almost everything they possibly can, since the beginning of the human race. Thus, perhaps not too surprisingly, GAN software has been widely used since it was first proposed in this amazingly recent 2014 paper. To gauge how widely GAN software has been used so far, see, for example, this 2019 article entitled "18 Impressive Applications of Generative Adversarial Networks (GANs)" Sounds (voices, music,...), Images (realistic pictures, paintings, drawings, handwriting, ...), Text,etc. The forgeries can be tweaked so that they range from being very similar to the originals, to being whimsical exaggerations thereof.


Empirically Verifying Hypotheses Using Reinforcement Learning

arXiv.org Artificial Intelligence

This paper formulates hypothesis verification as an RL problem. Specifically, we aim to build an agent that, given a hypothesis about the dynamics of the world, can take actions to generate observations which can help predict whether the hypothesis is true or false. Existing RL algorithms fail to solve this task, even for simple environments. In order to train the agents, we exploit the underlying structure of many hypotheses, factorizing them as {pre-condition, action sequence, post-condition} triplets. By leveraging this structure we show that RL agents are able to succeed at the task. Furthermore, subsequent fine-tuning of the policies allows the agent to correctly verify hypotheses not amenable to the above factorization.


Sampler Design for Implicit Feedback Data by Noisy-label Robust Learning

arXiv.org Machine Learning

Implicit feedback data is extensively explored in recommendation as it is easy to collect and generally applicable. However, predicting users' preference on implicit feedback data is a challenging task since we can only observe positive (voted) samples and unvoted samples. It is difficult to distinguish between the negative samples and unlabeled positive samples from the unvoted ones. Existing works, such as Bayesian Personalized Ranking (BPR), sample unvoted items as negative samples uniformly, therefore suffer from a critical noisy-label issue. To address this gap, we design an adaptive sampler based on noisy-label robust learning for implicit feedback data. To formulate the issue, we first introduce Bayesian Point-wise Optimization (BPO) to learn a model, e.g., Matrix Factorization (MF), by maximum likelihood estimation. We predict users' preferences with the model and learn it by maximizing likelihood of observed data labels, i.e., a user prefers her positive samples and has no interests in her unvoted samples. However, in reality, a user may have interests in some of her unvoted samples, which are indeed positive samples mislabeled as negative ones. We then consider the risk of these noisy labels, and propose a Noisy-label Robust BPO (NBPO). NBPO also maximizes the observation likelihood while connects users' preference and observed labels by the likelihood of label flipping based on the Bayes' theorem. In NBPO, a user prefers her true positive samples and shows no interests in her true negative samples, hence the optimization quality is dramatically improved. Extensive experiments on two public real-world datasets show the significant improvement of our proposed optimization methods.


Bayesian Low Rank Tensor Ring Model for Image Completion

arXiv.org Machine Learning

Low rank tensor ring model is powerful for image completion which recovers missing entries in data acquisition and transformation. The recently proposed tensor ring (TR) based completion algorithms generally solve the low rank optimization problem by alternating least squares method with predefined ranks, which may easily lead to overfitting when the unknown ranks are set too large and only a few measurements are available. In this paper, we present a Bayesian low rank tensor ring model for image completion by automatically learning the low rank structure of data. A multiplicative interaction model is developed for the low-rank tensor ring decomposition, where core factors are enforced to be sparse by assuming their entries obey Student-T distribution. Compared with most of the existing methods, the proposed one is free of parameter-tuning, and the TR ranks can be obtained by Bayesian inference. Numerical Experiments, including synthetic data, color images with different sizes and YaleFace dataset B with respect to one pose, show that the proposed approach outperforms state-of-the-art ones, especially in terms of recovery accuracy.


Probabilistic Classification Vector Machine for Multi-Class Classification

arXiv.org Machine Learning

The probabilistic classification vector machine (PCVM) synthesizes the advantages of both the support vector machine and the relevant vector machine, delivering a sparse Bayesian solution to classification problems. However, the PCVM is currently only applicable to binary cases. Extending the PCVM to multi-class cases via heuristic voting strategies such as one-vs-rest or one-vs-one often results in a dilemma where classifiers make contradictory predictions, and those strategies might lose the benefits of probabilistic outputs. To overcome this problem, we extend the PCVM and propose a multi-class probabilistic classification vector machine (mPCVM). Two learning algorithms, i.e., one top-down algorithm and one bottom-up algorithm, have been implemented in the mPCVM. The top-down algorithm obtains the maximum a posteriori (MAP) point estimates of the parameters based on an expectation-maximization algorithm, and the bottom-up algorithm is an incremental paradigm by maximizing the marginal likelihood. The superior performance of the mPCVMs, especially when the investigated problem has a large number of classes, is extensively evaluated on synthetic and benchmark data sets.


Statistical Foundation of Variational Bayes Neural Networks

arXiv.org Machine Learning

Despite the popularism of Bayesian neural networks in recent years, its use is somewhat limited in complex and big data situations due to the computational cost associated with full posterior evaluations. Variational Bayes (VB) provides a useful alternative to circumvent the computational cost and time complexity associated with the generation of samples from the true posterior using Markov Chain Monte Carlo (MCMC) techniques. The efficacy of the VB methods is well established in machine learning literature. However, its potential broader impact is hindered due to a lack of theoretical validity from a statistical perspective. However there are few results which revolve around the theoretical properties of VB, especially in non-parametric problems. In this paper, we establish the fundamental result of posterior consistency for the mean-field variational posterior (VP) for a feed-forward artificial neural network model. The paper underlines the conditions needed to guarantee that the VP concentrates around Hellinger neighborhoods of the true density function. Additionally, the role of the scale parameter and its influence on the convergence rates has also been discussed. The paper mainly relies on two results (1) the rate at which the true posterior grows (2) the rate at which the KL-distance between the posterior and variational posterior grows. The theory provides a guideline of building prior distributions for Bayesian NN models along with an assessment of accuracy of the corresponding VB implementation.


Bayes' Theorem in Layman's Terms

#artificialintelligence

If you have difficulty in understanding Bayes' theorem, trust me you are not alone. In this tutorial, I'll help you to cross that bridge step by step. Let's consider Alex and Brenda are two people in your office, When you are working you saw someone walked in front of you, and you didn't notice who is she/he. Now I'll give you extra information, Let's calculate the probabilities with this new information, Probability that Alex is the person passed by is 2/5 i.e, Probability that Brenda is the person passed by is 3/5 i.e, Probabilities that we are calculated before the new information are called Prior, and probabilities that we are calculated after the new information are called Posterior. Consider a scenario where, Alex comes to the office 3 days a week, and Brenda comes to the office 1 day a week.