gpflow
GPRat: Gaussian Process Regression with Asynchronous Tasks
Helmann, Maksim, Strack, Alexander, Pflüger, Dirk
Python is the de-facto language for software development in artificial intelligence (AI). Commonly used libraries, such as PyTorch and TensorFlow, rely on parallelization built into their BLAS backends to achieve speedup on CPUs. However, only applying parallelization in a low-level backend can lead to performance and scaling degradation. In this work, we present a novel way of binding task-based C++ code built on the asynchronous runtime model HPX to a high-level Python API using pybind11. We develop a parallel Gaussian process (GP) li- brary as an application. The resulting Python library GPRat combines the ease of use of commonly available GP libraries with the performance and scalability of asynchronous runtime systems. We evaluate the per- formance on a mass-spring-damper system, a standard benchmark from control theory, for varying numbers of regressors (features). The results show almost no binding overhead when binding the asynchronous HPX code using pybind11. Compared to GPyTorch and GPflow, GPRat shows superior scaling on up to 64 cores on an AMD EPYC 7742 CPU for train- ing. Furthermore, our library achieves a prediction speedup of 7.63 over GPyTorch and 25.25 over GPflow. If we increase the number of features from eight to 128, we observe speedups of 29.62 and 21.19, respectively. These results showcase the potential of using asynchronous tasks within Python-based AI applications.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
GitHub - GPflow/GPflow: Gaussian processes in TensorFlow
GPflow is a package for building Gaussian process models in Python. It implements modern Gaussian process inference for composable kernels and likelihoods. GPflow builds on TensorFlow 2.4 and TensorFlow Probability for running computations, which allows fast execution on GPUs. The online documentation (latest release)/(develop) contains more details. It was originally created by James Hensman and Alexander G. de G. Matthews.
Modern Gaussian Process Regression
Ever wonder how you can create non-parametric supervised learning models with unlimited expressive power? Look no further than Gaussian Process Regression (GPR), an algorithm that learns to make predictions almost entirely from the data itself (with a little help from hyperparameters). Combining this algorithm with recent advances in computing, such as automatic differentiation, allows for applying GPRs to solve a variety of supervised machine learning problems in near-real-time. In this article, we'll discuss: This is the second article in my GPR series. For a rigorous, Ab initio introduction to Gaussian Process Regression, please check out my previous article here. Before we dive into how we can implement and use GPR, let's quickly review the mechanics and theory behind this supervised machine learning algorithm.
Gaussian Process Regression on Molecules in GPflow
This post demonstrates how to train a Gaussian Process (GP) to predict molecular properties using the GPflow library by creating a custom-defined Tanimoto kernel to operate on Morgan fingerprints. In this example, we'll be trying to predict the experimentally-determined electronic transition wavelengths of molecular photoswitches, a class of molecule that undergoes a reversible transformation between its E and Z isomers upon irradiation by light. We'll start by importing all of the machine learning and chemistry libraries we're going to use. For our molecular representation, we're going to be working with the widely-used Morgan fingerprints. Under this representation, molecules are represented as bit vectors.
A Framework for Interdomain and Multioutput Gaussian Processes
van der Wilk, Mark, Dutordoir, Vincent, John, ST, Artemev, Artem, Adam, Vincent, Hensman, James
One obstacle to the use of Gaussian processes (GPs) in large-scale problems, and as a component in deep learning system, is the need for bespoke derivations and implementations for small variations in the model or inference. In order to improve the utility of GPs we need a modular system that allows rapid implementation and testing, as seen in the neural network community. We present a mathematical and software framework for scalable approximate inference in GPs, which combines interdomain approximations and multiple outputs. Our framework, implemented in GPflow, provides a unified interface for many existing multioutput models, as well as more recent convolutional structures. This simplifies the creation of deep models with GPs, and we hope that this work will encourage more interest in this approach.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
MOGPTK: The Multi-Output Gaussian Process Toolkit
de Wolff, Taco, Cuevas, Alejandro, Tobar, Felipe
GPs are designed through parametrizing a covariance kernel, meaning that constructing expressive kernels allows for an improved representation of complex signals. Recent advances extend the GP concept to multiple series (or channels), where both auto-correlations and cross-correlations among channels are designed jointly; we refer to these models as multi-output GP (MOGP) models. A key attribute of MOGPs is that appropriate cross-correlations allow for improved data-imputation and prediction tasks when the channels have missing data. Popular MOGP models include: i) the Linear Model of Coregionalization (LMC) [2], ii) the Cross-Spectral Mixture (CSM) [3], iii) the Convolutional Model (CONV) [4], and iv) the Multi-Output Spectral Mixture (MOSM) [5]. Training MOGPs is challenging due to the large number of parameters required to model all the cross-correlations, and the fact that most of MOGP models are parametrized in the spectral domain, thus being prone to local minima. Therefore, a unified framework that implements these MOGPs is required both by the the GP research community as well as by those interested in practical applications for multi-channel data.
- South America > Chile (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Italy (0.04)
Fitting Gaussian Process Models in Python
Written by Chris Fonnesbeck, Assistant Professor of Biostatistics, Vanderbilt University Medical Center. You can view, fork, and play with this project on the Domino data science platform. A common applied statistics task involves building regression models to characterize non-linear relationships between variables. It is possible to fit such models by assuming a particular non-linear functional form, such as a sinusoidal, exponential, or polynomial function, to describe one variable's response to the variation in another. Unless this relationship is obvious from the outset, however, it involves possibly extensive model selection procedures to ensure the most appropriate model is retained. Alternatively, a non-parametric approach can be adopted by defining a set of knots across the variable space and use a spline or kernel regression to describe arbitrary non-linear relationships.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Fitting Gaussian Process Models in Python
Written by Chris Fonnesbeck, Assistant Professor of Biostatistics, Vanderbilt University Medical Center. You can view, fork, and play with this project on the Domino data science platform. A common applied statistics task involves building regression models to characterize non-linear relationships between variables. It is possible to fit such models by assuming a particular non-linear functional form, such as a sinusoidal, exponential, or polynomial function, to describe one variable's response to the variation in another. Unless this relationship is obvious from the outset, however, it involves possibly extensive model selection procedures to ensure the most appropriate model is retained. Alternatively, a non-parametric approach can be adopted by defining a set of knots across the variable space and use a spline or kernel regression to describe arbitrary non-linear relationships.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)