Goto

Collaborating Authors

 Uncertainty


Meta-Learning for Stochastic Gradient MCMC

arXiv.org Machine Learning

Stochastic gradient Markov chain Monte Carlo (SG-MCMC) has become increasingly popular for simulating posterior samples in large-scale Bayesian modeling. However, existing SG-MCMC schemes are not tailored to any specific probabilistic model, even a simple modification of the underlying dynamical system requires significant physical intuition. This paper presents the first meta-learning algorithm that allows automated design for the underlying continuous dynamics of an SG-MCMC sampler. The learned sampler generalizes Hamiltonian dynamics with state-dependent drift and diffusion, enabling fast traversal and efficient exploration of neural network energy landscapes. Experiments validate the proposed approach on both Bayesian fully connected neural network and Bayesian recurrent neural network tasks, showing that the learned sampler out-performs generic, hand-designed SG-MCMC algorithms, and generalizes to different datasets and larger architectures.


Trading algorithms with learning in latent alpha models

arXiv.org Machine Learning

Alpha signals for statistical arbitrage strategies are often driven by latent factors. This paper analyses how to optimally trade with latent factors that cause prices to jump and diffuse. Moreover, we account for the effect of the trader's actions on quoted prices and the prices they receive from trading. Under fairly general assumptions, we demonstrate how the trader can learn the posterior distribution over the latent states, and explicitly solve the latent optimal trading problem. We provide a verification theorem, and a methodology for calibrating the model by deriving a variation of the expectation-maximization algorithm. To illustrate the efficacy of the optimal strategy, we demonstrate its performance through simulations and compare it to strategies which ignore learning in the latent factors. We also provide calibration results for a particular model using Intel Corporation stock as an example.


Disintegration and Bayesian Inversion via String Diagrams

arXiv.org Artificial Intelligence

The notions of disintegration and Bayesian inversion are fundamental in conditional probability theory. They produce channels, as conditional probabilities, from a joint state, or from an already given channel (in opposite direction). These notions exist in the literature, in concrete situations, but are presented here in abstract graphical formulations. The resulting abstract descriptions are used for proving basic results in conditional probability theory. The existence of disintegration and Bayesian inversion is discussed for discrete probability, and also for measure-theoretic probability --- via standard Borel spaces and via likelihoods. Finally, the usefulness of disintegration and Bayesian inversion is illustrated in several examples.


Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains

arXiv.org Artificial Intelligence

Model-based strategies for control are critical to obtain sample efficient learning. Dyna is a planning paradigm that naturally interleaves learning and planning, by simulating one-step experience to update the action-value function. This elegant planning strategy has been mostly explored in the tabular setting. The aim of this paper is to revisit sample-based planning, in stochastic and continuous domains with learned models. We first highlight the flexibility afforded by a model over Experience Replay (ER). Replay-based methods can be seen as stochastic planning methods that repeatedly sample from a buffer of recent agent-environment interactions and perform updates to improve data efficiency. We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly. We introduce a semi-parametric model learning approach, called Reweighted Experience Models (REMs), that makes it simple to sample next states or predecessors. We demonstrate that REM-Dyna exhibits similar advantages over replay-based methods in learning in continuous state problems, and that the performance gap grows when moving to stochastic domains, of increasing size.


ML + FV = $\heartsuit$? A Survey on the Application of Machine Learning to Formal Verification

arXiv.org Artificial Intelligence

Formal Verification (Fv) and Machine Learning (Ml) can seem incompatible due to their opposite mathematical foundations and their use in real-life problems: Fv mostly relies on discrete mathematics and aims at ensuring correctness; Ml often relies on probabilistic models and consists of learning patterns from training data. In this paper, we postulate that they are complementary in practice, and explore how Ml helps Fv in its classical approaches: static analysis, model-checking, theorem-proving, and Sat solving. We draw a landscape of the current practice and catalog some of the most prominent uses of Ml inside Fv tools, thus offering a new perspective on Fv techniques that can help researchers and practitioners to better locate the possible synergies. We discuss lessons learned from our work, point to possible improvements and offer visions for the future of the domain in the light of the science of software and systems modeling.


Stochastic seismic waveform inversion using generative adversarial networks as a geological prior

arXiv.org Machine Learning

We present an application of deep generative models in the context of partial-differential equation (PDE) constrained inverse problems. We combine a generative adversarial network (GAN) representing an a priori model that creates subsurface geological structures and their petrophysical properties, with the numerical solution of the PDE governing the propagation of acoustic waves within the earth's interior. We perform Bayesian inversion using an approximate Metropolis-adjusted Langevin algorithm (MALA) to sample from the posterior given seismic observations. Gradients with respect to the model parameters governing the forward problem are obtained by solving the adjoint of the acoustic wave equation. Gradients of the mismatch with respect to the latent variables are obtained by leveraging the differentiable nature of the deep neural network used to represent the generative model. We show that approximate MALA sampling allows efficient Bayesian inversion of model parameters obtained from a prior represented by a deep generative model, obtaining a diverse set of realizations that reflect the observed seismic response.


Embedding Words as Distributions with a Bayesian Skip-gram Model

arXiv.org Artificial Intelligence

We introduce a method for embedding words as probability densities in a low-dimensional space. Rather than assuming that a word embedding is fixed across the entire text collection, as in standard word embedding methods, in our Bayesian model we generate it from a word-specific prior density for each occurrence of a given word. Intuitively, for each word, the prior density encodes the distribution of its potential 'meanings'. These prior densities are conceptually similar to Gaussian embeddings. Interestingly, unlike the Gaussian embeddings, we can also obtain context-specific densities: they encode uncertainty about the sense of a word given its context and correspond to posterior distributions within our model. The context-dependent densities have many potential applications: for example, we show that they can be directly used in the lexical substitution task. We describe an effective estimation method based on the variational autoencoding framework. We also demonstrate that our embeddings achieve competitive results on standard benchmarks.


Assumed Density Filtering Q-learning

arXiv.org Artificial Intelligence

While off-policy temporal difference (TD) methods have widely been used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have not been utilized as frequently. One reason is that the non-linear max operation in the Bellman optimality equation makes it difficult to define conjugate distributions over the value functions. In this paper, we introduce a novel Bayesian approach to off-policy TD methods using Assumed Density Filtering (ADFQ), which updates beliefs on state-action values (Q) through an online Bayesian inference method. Uncertainty measures in the beliefs provide a natural regularization for learning, and we show how ADFQ reduces in a limiting case to the traditional Q-learning algorithm. Our empirical results demonstrate that the proposed ADFQ algorithms outperform comparable algorithms on several task domains. Moreover, our algorithms are computationally more efficient than other existing approaches to Bayesian reinforcement learning.


Top 5 Machine Learning Algorithms for Beginners – BMC Blogs

#artificialintelligence

Machine learning is a major component in the race towards artificial intelligence. Whether you're seeking true artificial intelligence or simply trying to gain insight from all the data you've been collecting, machine learning is a major step forward. But where to get started? If you're a beginner, machine learning can feel overwhelming – how to choose which algorithms to use, from the seemingly infinite options, and how to know just which one will provide the right predictions (data outputs). These top 5 machine learning algorithms for beginners offer a fine balance of ease, lower computational power, and immediate, accurate results.


Reconstructing networks with unknown and heterogeneous errors

arXiv.org Machine Learning

The vast majority of network datasets contains errors and omissions, although this is rarely incorporated in traditional network analysis. Recently, an increasing effort has been made to fill this methodological gap by developing network reconstruction approaches based on Bayesian inference. These approaches, however, rely on assumptions of uniform error rates and on direct estimations of the existence of each edge via repeated measurements, something that is currently unavailable for the majority of network data. Here we develop a Bayesian reconstruction approach that lifts these limitations by not only allowing for heterogeneous errors, but also for individual edge measurements without direct error estimates. Our approach works by coupling the inference approach with structured generative network models, which enable the correlations between edges to be used as reliable error estimates. Although our approach is general, we focus on the stochastic block model as the basic generative process, from which efficient nonparametric inference can be performed, and yields a principled method to infer hierarchical community structure from noisy data. We demonstrate the efficacy of our approach with a variety of empirical and artificial networks.