AITopics | Artemev, Artem

Collaborating Authors

Artemev, Artem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Recommendations for Baselines and Benchmarking Approximate Gaussian Processes

Ober, Sebastian W., Artemev, Artem, Wagenländer, Marcel, Grobins, Rudolfs, van der Wilk, Mark

arXiv.org Artificial IntelligenceFeb-15-2024

Gaussian processes (GPs) are a mature and widely-used component of the ML toolbox. One of their desirable qualities is automatic hyperparameter selection, which allows for training without user intervention. However, in many realistic settings, approximations are typically needed, which typically do require tuning. We argue that this requirement for tuning complicates evaluation, which has led to a lack of a clear recommendations on which method should be used in which situation. To address this, we make recommendations for comparing GP approximations based on a specification of what a user should expect from a method. In addition, we develop a training procedure for the variational method of Titsias [2009] that leaves no choices to the user, and show that this is a strong baseline that meets our specification. We conclude that benchmarking according to our suggestions gives a clearer view of the current state of the field, and uncovers problems that are still open that future papers should address.

approximation, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.09849

Country:

Europe > United Kingdom > England (0.14)
North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

Numerically Stable Sparse Gaussian Processes via Minimum Separation using Cover Trees

Terenin, Alexander, Burt, David R., Artemev, Artem, Flaxman, Seth, van der Wilk, Mark, Rasmussen, Carl Edward, Ge, Hong

arXiv.org Machine LearningNov-6-2023

Gaussian processes are frequently deployed as part of larger machine learning and decision-making systems, for instance in geospatial modeling, Bayesian optimization, or in latent Gaussian models. Within a system, the Gaussian process model needs to perform in a stable and reliable manner to ensure it interacts correctly with other parts of the system. In this work, we study the numerical stability of scalable sparse approximations based on inducing points. To do so, we first review numerical stability, and illustrate typical situations in which Gaussian process models can be unstable. Building on stability theory originally developed in the interpolation literature, we derive sufficient and in certain cases necessary conditions on the inducing points for the computations performed to be numerically stable. For low-dimensional tasks such as geospatial modeling, we propose an automated method for computing inducing points satisfying these conditions. This is done via a modification of the cover tree data structure, which is of independent interest. We additionally propose an alternative sparse approximation for regression with a Gaussian likelihood which trades off a small amount of performance to further improve stability. We provide illustrative examples showing the relationship between stability of calculations and predictive performance of inducing point methods on spatial tasks.

artificial intelligence, machine learning, modeling & simulation, (20 more...)

arXiv.org Machine Learning

2210.07893

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.92)
(2 more...)

Add feedback

Trieste: Efficiently Exploring The Depths of Black-box Functions with TensorFlow

Picheny, Victor, Berkeley, Joel, Moss, Henry B., Stojic, Hrvoje, Granta, Uri, Ober, Sebastian W., Artemev, Artem, Ghani, Khurram, Goodall, Alexander, Paleyes, Andrei, Vakili, Sattar, Pascual-Diaz, Sergio, Markou, Stratis, Qing, Jixiang, Loka, Nasrulloh R. B. S, Couckuyt, Ivo

arXiv.org Artificial IntelligenceFeb-16-2023

We present Trieste, an open-source Python package for Bayesian optimization and active learning benefiting from the scalability and efficiency of TensorFlow. Our library enables the plug-and-play of popular TensorFlow-based models within sequential decision-making loops, e.g. Gaussian processes from GPflow or GPflux, or neural networks from Keras. This modular mindset is central to the package and extends to our acquisition functions and the internal dynamics of the decision-making loop, both of which can be tailored and extended by researchers or engineers when tackling custom use cases. Trieste is a research-friendly and production-ready toolkit backed by a comprehensive test suite, extensive documentation, and available at https://github.com/secondmind-labs/trieste.

artificial intelligence, machine learning, optimization, (14 more...)

arXiv.org Artificial Intelligence

2302.08436

Country: Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (1.00)

Genre: Research Report (0.50)

Industry: Transportation > Air (0.43)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Memory Safe Computations with XLA Compiler

Artemev, Artem, Roeder, Tilman, van der Wilk, Mark

arXiv.org Machine LearningJun-28-2022

Software packages like TensorFlow and PyTorch are designed to support linear algebra operations, and their speed and usability determine their success. However, by prioritising speed, they often neglect memory requirements. As a consequence, the implementations of memory-intensive algorithms that are convenient in terms of software design can often not be run for large problems due to memory overflows. Memory-efficient solutions require complex programming approaches with significant logic outside the computational framework. This impairs the adoption and use of such algorithms. To address this, we developed an XLA compiler extension that adjusts the computational data-flow representation of an algorithm according to a user-specified memory limit. We show that k-nearest neighbour and sparse Gaussian process regression methods can be run at a much larger scale on a single device, where standard implementations would have failed. Our approach leads to better use of hardware resources. We believe that further focus on removing memory constraints at a compiler level will widen the range of machine learning methods that can be developed in the future.

artificial intelligence, implementation, machine learning, (16 more...)

arXiv.org Machine Learning

2206.14148

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.34)

Add feedback

Barely Biased Learning for Gaussian Process Regression

Burt, David R., Artemev, Artem, van der Wilk, Mark

arXiv.org Machine LearningSep-20-2021

Recent work in scalable approximate Gaussian process regression has discussed a bias-variance-computation trade-off when estimating the log marginal likelihood. We suggest a method that adaptively selects the amount of computation to use when estimating the log marginal likelihood so that the bias of the objective function is guaranteed to be small. While simple in principle, our current implementation of the method is not competitive computationally with existing approximations.

artificial intelligence, log marginal likelihood, machine learning, (12 more...)

arXiv.org Machine Learning

2109.09417

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Modeling & Simulation (0.64)

Add feedback

GPflux: A Library for Deep Gaussian Processes

Dutordoir, Vincent, Salimbeni, Hugh, Hambro, Eric, McLeod, John, Leibfried, Felix, Artemev, Artem, van der Wilk, Mark, Hensman, James, Deisenroth, Marc P., John, ST

arXiv.org Machine LearningApr-12-2021

We introduce GPflux, a Python library for Bayesian deep learning with a strong emphasis on deep Gaussian processes (DGPs). Implementing DGPs is a challenging endeavour due to the various mathematical subtleties that arise when dealing with multivariate Gaussian distributions and the complex bookkeeping of indices. To date, there are no actively maintained, open-sourced and extendable libraries available that support research activities in this area. GPflux aims to fill this gap by providing a library with state-of-the-art DGP algorithms, as well as building blocks for implementing novel Bayesian and GP-based hierarchical models and inference schemes. GPflux is compatible with and built on top of the Keras deep learning eco-system. This enables practitioners to leverage tools from the deep learning community for building and training customised Bayesian models, and create hierarchical models that consist of Bayesian and standard neural network layers in a single coherent framework. GPflux relies on GPflow for most of its GP objects and operations, which makes it an efficient, modular and extensible library, while having a lean codebase.

artificial intelligence, deep gaussian process, neural network, (3 more...)

arXiv.org Machine Learning

2104.05674

Genre: Research Report (0.40)

Technology:

Information Technology > Modeling & Simulation (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback

Tighter Bounds on the Log Marginal Likelihood of Gaussian Process Regression Using Conjugate Gradients

Artemev, Artem, Burt, David R., van der Wilk, Mark

arXiv.org Machine LearningFeb-16-2021

We propose a lower bound on the log marginal likelihood of Gaussian process regression models that can be computed without matrix factorisation of the full kernel matrix. We show that approximate maximum likelihood learning of model parameters by maximising our lower bound retains many of the sparse variational approach benefits while reducing the bias introduced into parameter learning. The basis of our bound is a more careful analysis of the log-determinant term appearing in the log marginal likelihood, as well as using the method of conjugate gradients to derive tight lower bounds on the term involving a quadratic form. Our approach is a step forward in unifying methods relying on lower bound maximisation (e.g. variational methods) and iterative approaches based on conjugate gradients for training Gaussian processes. In experiments, we show improved predictive performance with our model for a comparable amount of training time compared to other conjugate gradient based approaches.

artificial intelligence, bayesian inference, iterative gp, (16 more...)

arXiv.org Machine Learning

2102.08314

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Scalable Thompson Sampling using Sparse Gaussian Process Models

Vakili, Sattar, Picheny, Victor, Artemev, Artem

arXiv.org Machine LearningAug-24-2020

Thompson Sampling (TS) with Gaussian Process (GP) models is a powerful tool for optimizing non-convex objective functions. Despite favorable theoretical properties, the computational complexity of the standard algorithms quickly becomes prohibitive as the number of observation points (i.e. the time horizon) grows. Scalable TS methods can be implemented using sparse GP models, but at the price of an approximation error that invalidates the existing regret bounds. Here, we prove regret bounds for TS based on approximate GP posteriors, whose application to sparse GPs shows that the improvement in computational complexity can be achieved with no loss in terms of the order of regret performance.

artificial intelligence, kernel, optimization problem, (15 more...)

arXiv.org Machine Learning

2006.05356

Country:

Europe (0.28)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation

Picheny, Victor, Dutordoir, Vincent, Artemev, Artem, Durrande, Nicolas

arXiv.org Machine LearningJun-25-2020

Many machine learning models require a training procedure based on running stochastic gradient descent. A key element for the efficiency of those algorithms is the choice of the learning rate schedule. While finding good learning rates schedules using Bayesian optimisation has been tackled by several authors, adapting it dynamically in a data-driven way is an open question. This is of high practical importance to users that need to train a single, expensive model. To tackle this problem, we introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation, that flexibly adjusts to abrupt changes of behaviours induced by new learning rate values. As illustrated, this model is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the schedule for a set of similar tasks (in a classical BO setup), as well as warm-starting it for a new task.

artificial intelligence, learning rate, machine learning, (16 more...)

arXiv.org Machine Learning

2006.14376

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Translation Insensitivity for Deep Convolutional Gaussian Processes

Dutordoir, Vincent, van der Wilk, Mark, Artemev, Artem, Tomczak, Marcin, Hensman, James

arXiv.org Machine LearningFeb-15-2019

Deep learning has been at the foundation of large improvements in image classification. To improve the robustness of predictions, Bayesian approximations have been used to learn parameters in deep neural networks. We follow an alternative approach, by using Gaussian processes as building blocks for Bayesian deep learning models, which has recently become viable due to advances in inference for convolutional and deep structure. We investigate deep convolutional Gaussian processes, and identify a problem that holds back current performance. To remedy the issue, we introduce a translation insensitive convolutional kernel, which removes the restriction of requiring identical outputs for identical patch inputs. We show empirically that this convolutional kernel improves performances in both shallow and deep models. On MNIST, FASHION-MNIST and CIFAR-10 we improve previous GP models in terms of accuracy, with the addition of having more calibrated predictive probabilities than simple DNN models.

deep learning, kernel, neural network, (17 more...)

arXiv.org Machine Learning

1902.05888

Country: Europe > France (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback