Goto

Collaborating Authors

 Directed Networks


Elements of Sequential Monte Carlo

arXiv.org Machine Learning

A core problem in statistics and probabilistic machine learning is to compute probability distributions and expectations. This is the fundamental problem of Bayesian statistics and machine learning, which frames all inference as expectations with respect to the posterior distribution. The key challenge is to approximate these intractable expectations. In this tutorial, we review sequential Monte Carlo (SMC), a random-sampling-based class of methods for approximate inference. First, we explain the basics of SMC, discuss practical issues, and review theoretical results. We then examine two of the main user design choices: the proposal distributions and the so called intermediate target distributions. We review recent results on how variational inference and amortization can be used to learn efficient proposals and target distributions. Next, we discuss the SMC estimate of the normalizing constant, how this can be used for pseudo-marginal inference and inference evaluation. Throughout the tutorial we illustrate the use of SMC on various models commonly used in machine learning, such as stochastic recurrent neural networks, probabilistic graphical models, and probabilistic programs.


Testing Conditional Independence on Discrete Data using Stochastic Complexity

arXiv.org Machine Learning

Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice, especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as $L_2$ consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.


Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings

arXiv.org Artificial Intelligence

Agents are systems that optimize an objective function in an environment. Together, the goal and the environment induce secondary objectives, incentives. Modeling the agent-environment interaction in graphical models called influence diagrams, we can answer two fundamental questions about an agent's incentives directly from the graph: (1) which nodes is the agent incentivized to observe, and (2) which nodes is the agent incentivized to influence? The answers tell us which information and influence points need extra protection. For example, we may want a classifier for job applications to not use the ethnicity of the candidate, and a reinforcement learning agent not to take direct control of its reward mechanism. Different algorithms and training paradigms can lead to different influence diagrams, so our method can be used to identify algorithms with problematic incentives and help in designing algorithms with better incentives.


Markov Networks: Undirected Graphical Models

#artificialintelligence

This article briefs you about Markov Networks which falls under the family of Undirected Graphical Models (UGM). This article is a follow-up to Bayesian Network, which is a type of Directed Graphical Models. Key Motivation behind these networks is to parameterize the Joint Probability Distribution based on Local Independencies between Random Variables. Generally, Bayesian Network requires to pre-define a directionality to assert an influence of random variable. But there might be cases where interaction between nodes ( or random variables) are symmetric in nature, and we would like to have a model which can represent this symmetricity without directional influence.


A cross-center smoothness prior for variational Bayesian brain tissue segmentation

arXiv.org Machine Learning

Suppose one is faced with the challenge of tissue segmentation in MR images, without annotators at their center to provide labeled training data. One option is to go to another medical center for a trained classifier. Sadly, tissue classifiers do not generalize well across centers due to voxel intensity shifts caused by center-specific acquisition protocols. However, certain aspects of segmentations, such as spatial smoothness, remain relatively consistent and can be learned separately. Here we present a smoothness prior that is fit to segmentations produced at another medical center. This informative prior is presented to an unsupervised Bayesian model. The model clusters the voxel intensities, such that it produces segmentations that are similarly smooth to those of the other medical center. In addition, the unsupervised Bayesian model is extended to a semi-supervised variant, which needs no visual interpretation of clusters into tissues.


Deep learning for molecular generation and optimization - a review of the state of the art

arXiv.org Machine Learning

In the space of only a few years, deep generative modeling has revolutionized how we think of artificial creativity, yielding autonomous systems which produce original images, music, and text. Inspired by these successes, researchers are now applying deep generative modeling techniques to the generation and optimization of molecules - in our review we found 45 papers on the subject published in the past two years. These works point to a future where such systems will be used to generate lead molecules, greatly reducing resources spent downstream synthesizing and characterizing bad leads in the lab. In this review we survey the increasingly complex landscape of models and representation schemes that have been proposed. The four classes of techniques we describe are recursive neural networks, autoencoders, generative adversarial networks, and reinforcement learning. After first discussing some of the mathematical fundamentals of each technique, we draw high level connections and comparisons with other techniques and expose the pros and cons of each. Several important high level themes emerge as a result of this work, including the shift away from the SMILES string representation of molecules towards more sophisticated representations such as graph grammars and 3D representations, the importance of reward function design, the need for better standards for benchmarking and testing, and the benefits of adversarial training and reinforcement learning over maximum likelihood based training.


Bayesian Allocation Model: Inference by Sequential Monte Carlo for Nonnegative Tensor Factorizations and Topic Models using Polya Urns

arXiv.org Machine Learning

We introduce a dynamic generative model, Bayesian allocation model (BAM), which establishes explicit connections between nonnegative tensor factorization (NTF), graphical models of discrete probability distributions and their Bayesian extensions, and the topic models such as the latent Dirichlet allocation. BAM is based on a Poisson process, whose events are marked by using a Bayesian network, where the conditional probability tables of this network are then integrated out analytically. We show that the resulting marginal process turns out to be a Polya urn, an integer valued self-reinforcing process. This urn processes, which we name a Polya-Bayes process, obey certain conditional independence properties that provide further insight about the nature of NTF. These insights also let us develop space efficient simulation algorithms that respect the potential sparsity of data: we propose a class of sequential importance sampling algorithms for computing NTF and approximating their marginal likelihood, which would be useful for model selection. The resulting methods can also be viewed as a model scoring method for topic models and discrete Bayesian networks with hidden variables. The new algorithms have favourable properties in the sparse data regime when contrasted with variational algorithms that become more accurate when the total sum of the elements of the observed tensor goes to infinity. We illustrate the performance on several examples and numerically study the behaviour of the algorithms for various data regimes.


Deep Log-Likelihood Ratio Quantization

arXiv.org Machine Learning

In this work, a deep learning-based method for log-likelihood ratio (LLR) lossy compression and quantization is proposed, with emphasis on a single-input single-output uncorrelated fading communication setting. A deep autoencoder network is trained to compress, quantize and reconstruct the bit log-likelihood ratios corresponding to a single transmitted symbol. Specifically, the encoder maps to a latent space with dimension equal to the number of sufficient statistics required to recover the inputs - equal to three in this case - while the decoder aims to reconstruct a noisy version of the latent representation with the purpose of modeling quantization effects in a differentiable way. Simulation results show that, when applied to a standard rate-1/2 low-density parity-check (LDPC) code, a finite precision compression factor of nearly three times is achieved when storing an entire codeword, with an incurred loss of performance lower than 0.1 dB compared to straightforward scalar quantization of the log-likelihood ratios.


Revisiting clustering as matrix factorisation on the Stiefel manifold

arXiv.org Machine Learning

Our approach leverages the well known Burer-Monteiro factorisation strategy from large scale optimisation, in the context of low rank estimation. Moreover, our Burer-Monteiro factors are shown to lie on a Stiefel manifold. We propose a new generalized Bayesian estimator for this problem and prove novel prediction bounds for clustering. We also devise a componentwise Langevin sampler on the Stiefel manifold to compute this estimator.


Pragmatic classification of movement primitives for stroke rehabilitation

arXiv.org Machine Learning

Rehabilitation training is the primary intervention to improve motor recovery after stroke, but a tool to measure functional training does not currently exist. To bridge this gap, we previously developed an approach to classify functional movement primitives using wearable sensors and a machine learning (ML) algorithm. We found that this approach had encouraging classification performance but had computational and practical limitations, such as training time, sensor cost, and magnetic drift. Here, we sought to refine this approach and determine the algorithm, sensor configurations, and data requirements needed to maximize computational and practical performance. Motion data had been previously collected from 6 stroke patients wearing 11 inertial measurement units (IMUs) as they moved objects on a target array. To identify optimal ML performance, we evaluated 4 algorithms that are commonly used in activity recognition (linear discriminant analysis (LDA), na\"ive Bayes, support vector machine, and k-nearest neighbors). We compared their classification accuracy, computational complexity, and tuning requirements. To identify optimal sensor configuration, we progressively sampled fewer sensors and compared classification accuracy. To identify optimal data requirements, we compared accuracy using data from IMUs versus accelerometers. We found that LDA had the highest classification accuracy (92%) of the algorithms tested. It also was the most pragmatic, with low training and testing times and modest tuning requirements. We found that 7 sensors on the paretic arm and back resulted in the best accuracy. Using this array, accelerometers had a lower accuracy (84%). We refined strategies to accurately and pragmatically quantify functional movement primitives in stroke patients. We propose that this optimized ML-sensor approach could be a means to quantify training dose after stroke.