AITopics | extrapolate

Modern learning systems excel at interpolation but struggle to generalize to unseen tasks outside the training distribution's support. This failure occurs even in simple settings, such as handling task parameters beyond the training range, and persists despite advances in foundation models. To this end, we develop the Relational Task Extrapolator (RTE), an algorithm designed to enable systematic extrapolation to novel tasks. The key observation is that extrapolation is inherently relational: extrapolating to unseen tasks requires learning how tasks transform into one another. If a model learns the transformation between tasks A and B during training, it can apply that same transformation to relate known tasks to unseen ones at test time. RTE operationalizes this idea by decomposing each target task into a known anchor task and a transformation linking the anchor and target. It then learns a relational operator, mapping an anchor-transformation pair to predictions for the target task. We instantiate RTE across multiple task extrapolation regimes in function prediction, e.g. where target tasks use out-of-range parameters (parameter extrapolation), have greater compositional depth (length extrapolation), and/or recombine function primitives in unseen ways (compositional extrapolation). We further extend RTE to sequence prediction, integrating it into fine-tuning algorithms for foundation models. Across empirical studies, we find that RTE substantially outperforms existing approaches on extrapolation to novel, unseen tasks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2605.30132

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
(2 more...)

Add feedback

afbe068bd0469f4cd778c0f8106181b6-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 11:59:09 GMT

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Hampshire > Southampton (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

fb7451e43f9c1c35b774bcfad7a5714b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 01:16:11 GMT

arxiv preprint arxiv, generalization, length generalization, (13 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

DeepThinking3NeurIPS2022

Avi Schwarzschild

Neural Information Processing SystemsFeb-10-2026, 05:44:30 GMT

extrapolation, iteration, maze, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Cognitive Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Physics-Integrated Variational Autoencoders for Robust and Interpretable Generative Modeling

Neural Information Processing SystemsDec-24-2025, 08:42:38 GMT

Integrating physics models within machine learning models holds considerable promise toward learning robust models with improved interpretability and abilities to extrapolate. In this work, we focus on the integration of incomplete physics models into deep generative models. In particular, we introduce an architecture of variational autoencoders (VAEs) in which a part of the latent space is grounded by physics. A key technical challenge is to strike a balance between the incomplete physics and trainable components such as neural networks for ensuring that the physics part is used in a meaningful manner. To this end, we propose a regularized learning method that controls the effect of the trainable components and preserves the semantics of the physics-based latent variables as intended. We not only demonstrate generative performance improvements over a set of synthetic and real-world datasets, but we also show that we learn robust models that can consistently extrapolate beyond the training distribution in a meaningful manner. Moreover, we show that we can control the generative process in an interpretable manner.

name change, physics-integrated variational autoencoder, robust and interpretable generative modeling, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Neural Arithmetic Logic Units

Andrew Trask, Felix Hill, Scott E. Reed, Jack Rae, Chris Dyer, Phil Blunsom

Neural Information Processing SystemsNov-20-2025, 14:22:57 GMT

This failure pattern indicates that the learned behavior is better characterized by memorization than by systematic abstraction.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities

Datseris, Aleksis, Vassileva, Sylvia, Koychev, Ivan, Boytcheva, Svetla

arXiv.org Artificial IntelligenceOct-6-2025

This paper introduces a novel approach to position embeddings in transformer models, named "Exact Positional Embeddings" (ExPE). An absolute positional embedding method that can extrapolate to sequences of lengths longer than the ones it was trained on. Traditional transformer models rely on absolute or relative position embeddings to incorporate positional information into token embeddings, which often struggle with extrapolation to sequences longer than those seen during training. Our proposed method utilizes a novel embedding strategy that encodes exact positional information by overriding specific dimensions of the embedding vectors, thereby enabling a more precise representation of token positions. The proposed approach not only maintains the integrity of the original embeddings but also enhances the model's ability to generalize to more extended sequences. In causal language modeling, our ExPE embeddings significantly reduce perplexity compared to rotary and sinusoidal embeddings, when tested on sequences longer than those used in training.

information, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2509.19569

Country: Asia > Middle East (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

We thank all the reviewers for their constructive comments

Neural Information Processing SystemsOct-3-2025, 03:21:37 GMT

We thank all the reviewers for their constructive comments. Making predictions directly on a pixel level without the intermediate structures won't be Still, we follow the reviewers' suggestion by including an additional baseline that predicts directly over the pixels. The above figure shows the results. Dreamer's prediction deviates from the ground truth and quickly becomes blurry, Baselines, even with graph-structured prediction models, cannot cope with such out of distribution generalization. Applicability of the proposed method (R4, R1).

artificial intelligence, constructive comment, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Selective Underfitting in Diffusion Models

Song, Kiwhan, Kim, Jaeyeon, Chen, Sitan, Du, Yilun, Kakade, Sham, Sitzmann, Vincent

arXiv.org Artificial IntelligenceOct-3-2025

Diffusion models have emerged as the principal paradigm for generative modeling across various domains. During training, they learn the score function, which in turn is used to generate samples at inference. They raise a basic yet unsolved question: which score do they actually learn? In principle, a diffusion model that matches the empirical score in the entire data space would simply reproduce the training data, failing to generate novel samples. Recent work addresses this question by arguing that diffusion models underfit the empirical score due to training-time inductive biases. In this work, we refine this perspective, introducing the notion of selective underfitting: instead of underfitting the score everywhere, better diffusion models more accurately approximate the score in certain regions of input space, while underfitting it in others. We characterize these regions and design empirical interventions to validate our perspective. Our results establish that selective underfitting is essential for understanding diffusion models, yielding new, testable insights into their generalization and generative performance.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.01378

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Technology: