Schaechtle, Ulrich, Zinberg, Ben, Radul, Alexey, Stathis, Kostas, Mansinghka, Vikash K.

Gaussian Processes (GPs) are widely used tools in statistics, machine learning, robotics, computer vision, and scientific computation. However, despite their popularity, they can be difficult to apply; all but the simplest classification or regression applications require specification and inference over complex covariance functions that do not admit simple analytical posteriors. This paper shows how to embed Gaussian processes in any higher-order probabilistic programming language, using an idiom based on memoization, and demonstrates its utility by implementing and extending classic and state-of-the-art GP applications. The interface to Gaussian processes, called gpmem, takes an arbitrary real-valued computational process as input and returns a statistical emulator that automatically improve as the original process is invoked and its input-output behavior is recorded. The flexibility of gpmem is illustrated via three applications: (i) robust GP regression with hierarchical hyper-parameter learning, (ii) discovering symbolic expressions from time-series data by fully Bayesian structure learning over kernels generated by a stochastic grammar, and (iii) a bandit formulation of Bayesian optimization with automatic inference and action selection. All applications share a single 50-line Python library and require fewer than 20 lines of probabilistic code each.

Tompkins, Anthony (The University of Sydney) | Ramos, Fabio (The University of Sydney)

Gaussian Processes (GPs) provide an extremely powerful mechanism to model a variety of problems but incur an O(N 3 ) complexity in the number of data samples. Common approximation methods rely on what are often termed inducing points but still typically incur an O(NM 2 ) complexity in the data and corresponding inducing points. Using Random Fourier Feature (RFF) maps, we overcome this by transforming the problem into a Bayesian Linear Regression formulation upon which we apply a Bayesian Variational treatment that also allows learning the corresponding kernel hyperparameters, likelihood and noise parameters. In this paper we introduce an alternative method using Fourier series to obtain spectral representations of common kernels, in particular for periodic warpings, which surprisingly have a convergent, non-random form using special functions, requiring fewer spectral features to approximate their corresponding kernel to high accuracy. Using this, we can fuse the Random Fourier Feature spectral representations of common kernels with their periodic counterparts to show how they can more effectively and expressively learn patterns in time-series for both interpolation and extrapolation. This method combines robustness, scalability and equally importantly, interpretability through a symbolic declarative grammar that is both functionally and humanly intuitive — a property that is crucial for explainable decision making. Using probabilistic programming and Variational Inference we are able to efficiently optimise over these rich functional representations. We show significantly improved Gram matrix approximation errors, and also demonstrate the method in several time-series problems comparing other commonly used approaches such as recurrent neural networks.

Schaechtle, Ulrich, Saad, Feras, Radul, Alexey, Mansinghka, Vikash

There is a widespread need for techniques that can discover structure from time series data. Recently introduced techniques such as Automatic Bayesian Covariance Discovery (ABCD) provide a way to find structure within a single time series by searching through a space of covariance kernels that is generated using a simple grammar. While ABCD can identify a broad class of temporal patterns, it is difficult to extend and can be brittle in practice. This paper shows how to extend ABCD by formulating it in terms of probabilistic program synthesis. The key technical ideas are to (i) represent models using abstract syntax trees for a domain-specific probabilistic language, and (ii) represent the time series model prior, likelihood, and search strategy using probabilistic programs in a sufficiently expressive language. The final probabilistic program is written in under 70 lines of probabilistic code in Venture. The paper demonstrates an application to time series clustering that involves a non-parametric extension to ABCD, experiments for interpolation and extrapolation on real-world econometric data, and improvements in accuracy over both non-parametric and standard regression baselines.

Duvenaud, David, Lloyd, James Robert, Grosse, Roger, Tenenbaum, Joshua B., Ghahramani, Zoubin

Despite its importance, choosing the structural form of the kernel in nonparametric regression remains a black art. We define a space of kernel structures which are built compositionally by adding and multiplying a small number of base kernels. We present a method for searching over this space of structures which mirrors the scientific discovery process. The learned structures can often decompose functions into interpretable components and enable long-range extrapolation on time-series datasets. Our structure search method outperforms many widely used kernels and kernel combination methods on a variety of prediction tasks.

Probabilistic programming languages represent complex data with intermingled models in a few lines of code. Efficient inference algorithms in probabilistic programming languages make possible to build unified frameworks to compute interesting probabilities of various large, real-world problems. When the structure of model is given, constructing a probabilistic program is rather straightforward. Thus, main focus have been to learn the best model parameters and compute marginal probabilities. In this paper, we provide a new perspective to build expressive probabilistic program from continue time series data when the structure of model is not given. The intuition behind of our method is to find a descriptive covariance structure of time series data in nonparametric Gaussian process regression. We report that such descriptive covariance structure efficiently derives a probabilistic programming description accurately.