Plotting

 Schauer, Moritz


Causal structure learning with momentum: Sampling distributions over Markov Equivalence Classes of DAGs

arXiv.org Machine Learning

In the context of inferring a Bayesian network structure (directed acyclic graph, DAG for short), we devise a non-reversible continuous time Markov chain, the "Causal Zig-Zag sampler", that targets a probability distribution over classes of observationally equivalent (Markov equivalent) DAGs. The classes are represented as completed partially directed acyclic graphs (CPDAGs). The non-reversible Markov chain relies on the operators used in Chickering's Greedy Equivalence Search (GES) and is endowed with a momentum variable, which improves mixing significantly as we show empirically. The possible target distributions include posterior distributions based on a prior over DAGs and a Markov equivalent likelihood. We offer an efficient implementation wherein we develop new algorithms for listing, counting, uniformly sampling, and applying possible moves of the GES operators, all of which significantly improve upon the state-of-the-art.


Differentiating Metropolis-Hastings to Optimize Intractable Densities

arXiv.org Artificial Intelligence

We develop an algorithm for automatic differentiation of Metropolis-Hastings samplers, allowing us to differentiate through probabilistic inference, even if the model has discrete components within it. Our approach fuses recent advances in stochastic automatic differentiation with traditional Markov chain coupling schemes, providing an unbiased and low-variance gradient estimator. This allows us to apply gradient-based optimization to objectives expressed as expectations over intractable target densities. We demonstrate our approach by finding an ambiguous observation in a Gaussian mixture model and by maximizing the specific heat in an Ising model.


Automatic Differentiation of Programs with Discrete Randomness

arXiv.org Artificial Intelligence

Automatic differentiation (AD), a technique for constructing new programs which compute the derivative of an original program, has become ubiquitous throughout scientific computing and deep learning due to the improved performance afforded by gradient-based optimization. However, AD systems have been restricted to the subset of programs that have a continuous dependence on parameters. Programs that have discrete stochastic behaviors governed by distribution parameters, such as flipping a coin with probability $p$ of being heads, pose a challenge to these systems because the connection between the result (heads vs tails) and the parameters ($p$) is fundamentally discrete. In this paper we develop a new reparameterization-based methodology that allows for generating programs whose expectation is the derivative of the expectation of the original program. We showcase how this method gives an unbiased and low-variance estimator which is as automated as traditional AD mechanisms. We demonstrate unbiased forward-mode AD of discrete-time Markov chains, agent-based models such as Conway's Game of Life, and unbiased reverse-mode AD of a particle filter. Our code package is available at https://github.com/gaurav-arya/StochasticAD.jl.


Nonparametric Bayesian volatility learning under microstructure noise

arXiv.org Machine Learning

Aiming at financial applications, we study the problem of learning the volatility under market microstructure noise. Specifically, we consider noisy discrete time observations from a stochastic differential equation and develop a novel computational method to learn the diffusion coefficient of the equation. We take a nonparametric Bayesian approach, where we model the volatility function a priori as piecewise constant. Its prior is specified via the inverse Gamma Markov chain. Sampling from the posterior is accomplished by incorporating the Forward Filtering Backward Simulation algorithm in the Gibbs sampler. Good performance of the method is demonstrated on two representative synthetic data examples. Finally, we apply the method on the EUR/USD exchange rate dataset.