edward2
Reviews: Simple, Distributed, and Accelerated Probabilistic Programming
In this submission, the authors describe the design, implementation and performance of Edward2, a low-level probabilistic programming language that seamlessly integrates tensorflow, in particular, tensorflow distribution. The key concept of Edward2 is the random variable, which should be understand as general python functions possibly with random choices in the context of Edward2. Also, continuing the design decision of its first version, Edward2 implements the principle of exposing inference to the users while providing them with enough components and combinators so as to make building custom-inference routines easy. This is different from the principle behind other high-level probabilistic programming systems, which is to hide or automate inference from their users. The submission explains a wide range of benefits of following this principle of exposing inference, such as huge boost in the scalability of inference engines and support for non-standard inference tasks.
Automatic Reparameterisation of Probabilistic Programs
Gorinova, Maria I., Moore, Dave, Hoffman, Matthew D.
Probabilistic programming has emerged as a powerful paradigm in statistics, applied science, and machine learning: by decoupling modelling from inference, it promises to allow modellers to directly reason about the processes generating data. However, the performance of inference algorithms can be dramatically affected by the parameterisation used to express a model, requiring users to transform their programs in non-intuitive ways. We argue for automating these transformations, and demonstrate that mechanisms available in recent modeling frameworks can implement non-centring and related reparameterisations. This enables new inference algorithms, and we propose two: a simple approach using interleaved sampling and a novel variational formulation that searches over a continuous space of parameterisations. We show that these approaches enable robust inference across a range of models, and can yield more efficient samplers than the best fixed parameterisation.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Shikoku > Kōchi Prefecture > Kochi (0.04)
Simple, Distributed, and Accelerated Probabilistic Programming
Tran, Dustin, Hoffman, Matthew W., Moore, Dave, Suter, Christopher, Vasudevan, Srinivas, Radul, Alexey
We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. In particular, we distill probabilistic programming down to a single abstraction—the random variable. Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-parallel autoregressive model (Image Transformer) with TPUv2s; and multi-GPU No-U-Turn Sampler (NUTS). For both a state-of-the-art VAE on 64x64 ImageNet and Image Transformer on 256x256 CelebA-HQ, our approach achieves an optimal linear speedup from 1 to 256 TPUv2 chips. With NUTS, we see a 100x speedup on GPUs over Stan and 37x over PyMC3.
Simple, Distributed, and Accelerated Probabilistic Programming
Tran, Dustin, Hoffman, Matthew W., Moore, Dave, Suter, Christopher, Vasudevan, Srinivas, Radul, Alexey
We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. In particular, we distill probabilistic programming down to a single abstraction—the random variable. Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-parallel autoregressive model (Image Transformer) with TPUv2s; and multi-GPU No-U-Turn Sampler (NUTS). For both a state-of-the-art VAE on 64x64 ImageNet and Image Transformer on 256x256 CelebA-HQ, our approach achieves an optimal linear speedup from 1 to 256 TPUv2 chips. With NUTS, we see a 100x speedup on GPUs over Stan and 37x over PyMC3.
Simple, Distributed, and Accelerated Probabilistic Programming
Tran, Dustin, Hoffman, Matthew, Moore, Dave, Suter, Christopher, Vasudevan, Srinivas, Radul, Alexey, Johnson, Matthew, Saurous, Rif A.
We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. In particular, we distill probabilistic programming down to a single abstraction---the random variable. Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-parallel autoregressive model (Image Transformer) with TPUv2s; and multi-GPU No-U-Turn Sampler (NUTS). For both a state-of-the-art VAE on 64x64 ImageNet and Image Transformer on 256x256 CelebA-HQ, our approach achieves an optimal linear speedup from 1 to 256 TPUv2 chips. With NUTS, we see a 100x speedup on GPUs over Stan and 37x over PyMC3.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)