AITopics | Education

Gaussian processes with derivative information are useful in many settings where derivative information is available, including numerous Bayesian optimization and regression tasks that arise in the natural sciences. Incorporating derivative observations, however, comes with a dominating O(N3D3) computational cost when training on N points in D input dimensions. This is intractable for even moderately sized problems. While recent work has addressed this intractability in the low-Dsetting, the high-N, high-Dsetting is still unexplored and of great value, particularly as machine learning problems increasingly become high dimensional. In this paper, we introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference. Analogous to the use of inducing values to sparsify the labels of a training set, we introduce the concept of inducing directional derivatives to sparsify the partial derivative information of a training set. This enables us to construct a variational posterior that incorporates derivative information but whose size depends neither on the full dataset size N nor the full dimensionality D. We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed using Bayesian optimization. Surprisingly, we find that our approach can improve regression performance even in settings where only label data is available.

artificial intelligence, machine learning, optimization problem, (15 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report (0.68)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

Add feedback

fantasization_with_svgps

Wesley Maddox

Neural Information Processing SystemsApr-25-2026, 09:38:50 GMT

With a principled representation of uncertainty and closed form posterior updates, Gaussian processes (GPs) are a natural choice for online decision making. However, Gaussian processes typically require at least O(n2) computations for n training points, limiting their general applicability. Stochastic variational Gaussian processes (SVGPs) can provide scalable inference for a dataset of fixed size, but are difficult to efficiently condition on new data. We propose online variational conditioning (OVC), a procedure for efficiently conditioning SVGPs in an online setting that does not require re-training through the evidence lower bound with the addition of new data. OVC enables the pairing of SVGPs with advanced lookahead acquisition functions for black-box optimization, even with non-Gaussian likelihoods. We show OVC provides compelling performance in a range of applications including active learning of malaria incidence, and reinforcement learning on MuJoCo simulated robotic control tasks.

artificial intelligence, machine learning, optimization problem, (13 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre:

Research Report (0.46)
Instructional Material (0.46)

Industry:

Health & Medicine (1.00)
Education > Educational Setting > Online (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

184a03a3ad07e8897c62461c02634b02-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 09:37:48 GMT

machine learning, reinforcement learning, teaching, (19 more...)

Neural Information Processing Systems

Country:

Asia (0.28)
Europe (0.28)

Genre: Research Report (0.46)

Industry:

Education (0.70)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

1838feeb71c4b4ea524d0df2f7074245-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-25-2026, 09:20:34 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Industry:

Transportation > Ground > Road (0.71)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

31839b036f63806cba3f47b93af8ccb5-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 09:19:44 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

Add feedback

Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems

Neural Information Processing SystemsApr-25-2026, 06:51:41 GMT

The Transformer and its variants have been proven to be efficient sequence learners in many different domains. Despite their staggering success, a critical issue has been the enormous number of parameters that must be trained (ranging from 107 to 1011) along with the quadratic complexity of dot-product attention. In this work, we investigate the problem of approximating the two central components of the Transformer -- multi-head self-attention and point-wise feed-forward transformation, with reduced parameter space and computational complexity. We build upon recent developments in analyzing deep neural networks as numerical solvers of ordinary differential equations. Taking advantage of an analogy between Transformer stages and the evolution of a dynamical system of multiple interacting particles, we formulate a temporal evolution scheme, TransEvolve, to bypass costly dot-product attention over multiple stacked layers. We perform exhaustive experiments with TransEvolve on well-known encoder-decoder as well as encoder-only tasks. We observe that the degree of approximation (or inversely, the degree of parameter reduction) has different effects on the performance, depending on the task. While in the encoder-decoder regime, TransEvolvedelivers performances comparable to the original Transformer, in encoder-only tasks it consistently outperforms Transformer along with several subsequent variants.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks

Neural Information Processing SystemsApr-25-2026, 06:33:56 GMT

In this paper, we study the generalization properties of Model-Agnostic MetaLearning (MAML) algorithms for supervised learning problems. We focus on the setting in which we train the MAML model over mtasks, each with ndata points, and characterize its generalization error from two points of view: First, we assume the new task at test time is one of the training tasks, and we show that, for strongly convex objective functions, the expected excess population loss is bounded by O(1/mn). Second, we consider the MAML algorithm's generalization to an unseen task and show that the resulting generalization error depends on the total variation distance between the underlying distributions of the new task and the tasks observed during the training process. Our proof techniques rely on the connections between algorithmic stability and generalization bounds of algorithms. In particular, we propose a new definition of stability for meta-learning algorithms, which allows us to capture the role of both the number of tasks mand number of samples per task non the generalization error of MAML.

artificial intelligence, generalization error, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre:

Research Report (0.46)
Instructional Material (0.46)

Industry:

Education (0.48)
Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Automatic Data Augmentation for Generalization in Reinforcement Learning

Neural Information Processing SystemsApr-25-2026, 06:15:56 GMT

Deep reinforcement learning (RL) agents often fail to generalize beyond their training environments. To alleviate this problem, recent work has proposed the use of data augmentation. However, different tasks tend to benefit from different types of augmentations and selecting the right one typically requires expert knowledge. In this paper, we introduce three approaches for automatically finding an effective augmentation for any RL task. These are combined with two novel regularization terms for the policy and value function, required to make the use of data augmentation theoretically sound for actor-critic algorithms. Our method achieves a new state-of-the-art1on the Procgen benchmark and outperforms popular RL algorithms on DeepMind Control tasks with distractors. In addition, our agent learns policies and representations which are more robust to changes in the environment that are irrelevant for solving the task, such as the background.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Filters

Collaborating Authors

Education

334467d41d5cf21e234465a1530ba647-Paper.pdf

32e54441e6382a7fbacbbbaf3c450059-Supplemental.pdf

Scaling Gaussian Processes with Derivative Information Using Variational Inference

fantasization_with_svgps

184a03a3ad07e8897c62461c02634b02-Paper-Conference.pdf

1838feeb71c4b4ea524d0df2f7074245-Paper-Datasets_and_Benchmarks.pdf

31839b036f63806cba3f47b93af8ccb5-Paper.pdf

Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems

Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks

Automatic Data Augmentation for Generalization in Reinforcement Learning