AITopics

2605.03733

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningApr-21-2026

Sampling for Quality: Training-Free Reward-Guided LLM Decoding via Sequential Monte Carlo

Markovic-Voronov, Jelena, Zhu, Wenhui, Long, Bo, Wang, Zhipeng, Gupta, Suyash, Behdin, Kayhan, Chen, Bee-Chung, Agarwal, Deepak

We introduce a principled probabilistic framework for reward-guided decoding in large language models, addressing the limitations of standard decoding methods that optimize token-level likelihood rather than sequence-level quality. Our method defines a reward-augmented target distribution over complete sequences by combining model transition probabilities with prefix-dependent reward potentials. Importantly, the approach is training-free: it leaves model weights unchanged and instead modifies the inference distribution via reward potentials, with all gains arising purely from inference-time sampling. To sample from this distribution, we develop Sequential Monte Carlo algorithms, including a computationally efficient prefix-only variant and a lookahead variant whose intermediate targets match the exact marginals of the full sequence distribution. The framework also integrates resample-move updates with Metropolis-Hastings rejuvenation and supports block-wise generation, subsuming common decoding strategies such as temperature sampling and power-tempered objectives. Empirical results across three 7B models show significant gains. On code generation (HumanEval), our method improves base performance by up to 54.9% and surpasses the strongest sampling baselines by 9.1%-15.3%. On mathematical reasoning (MATH500), it achieves gains of up to 8.8%. Notably, it reaches 87.8% on HumanEval and 78.4% on MATH500 with Qwen2.5-7B, consistently outperforming the reinforcement learning method GRPO.

large language model, machine learning, reinforcement learning, (18 more...)

2604.16453

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.34)

Choi, Jinmyeong, Shook, Brad, Dubrawski, Artur

Non-Stationarity in the Embedding Space of Time Series Foundation Models

arXiv.org Machine LearningApr-21-2026

Time series foundation models (TSFMs) are widely used as generic feature extractors, yet the notion of non-stationarity in their embedding spaces remains poorly understood. Recent work often conflates non-stationarity with distribution shift, blurring distinctions fundamental to classical time-series analysis and long-standing methodologies such as statistical process control (SPC). In SPC, non-stationarity signals a process leaving a stable regime - via shifts in mean, variance, or emerging trends - and detecting such departures is central to quality monitoring and change-point analysis. Motivated by this diagnostic tradition, we study how different forms of distributional non-stationarity - mean shifts, variance changes, and linear trends - become linearly accessible in TSFM embedding spaces under controlled conditions. We further examine temporal non-stationarity arising from persistence, which reflects violations of weak stationarity due to long-memory or near-unit-root behavior rather than explicit distributional shifts. By sweeping shift strength and probing multiple TSFMs, we find that embedding-space detectability of non-stationarity degrades smoothly and that different models exhibit distinct, model-specific failure modes.

artificial intelligence, machine learning, persistence, (15 more...)

2604.16428

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Tan, Xiaojun, Zhao, Yuchen

Ordinary Least Squares is a Special Case of Transformer

arXiv.org Machine LearningApr-16-2026

The statistical essence of the Transformer architecture has long remained elusive: Is it a universal approximator, or a neural network version of known computational algorithms? Through rigorous algebraic proof, we show that the latter better describes Transformer's basic nature: Ordinary Least Squares (OLS) is a special case of the single-layer Linear Transformer. Using the spectral decomposition of the empirical covariance matrix, we construct a specific parameter setting where the attention mechanism's forward pass becomes mathematically equivalent to the OLS closed-form projection. This means attention can solve the problem in one forward pass, not by iterating. Building upon this prototypical case, we further uncover a decoupled slow and fast memory mechanism within Transformers. Finally, the evolution from our established linear prototype to standard Transformers is discussed. This progression facilitates the transition of the Hopfield energy function from linear to exponential memory capacity, thereby establishing a clear continuity between modern deep architectures and classical statistical inference.

artificial intelligence, machine learning, transformer, (17 more...)

2604.13656

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Farzaneh, Amirmohammad, Simeone, Osvaldo

Post-Selection Distributional Model Evaluation

arXiv.org Machine LearningApr-13-2026

Formal model evaluation methods typically certify that a model satisfies a prescribed target key performance indicator (KPI) level. However, in many applications, the relevant target KPI level may not be known a priori, and the user may instead wish to compare candidate models by analyzing the full trade-offs between performance and reliability achievable at test time by the models. This task, requiring the reliable estimate of the test-time KPI distributions, is made more complicated by the fact that the same data must often be used both to pre-select a subset of candidate models and to estimate their KPI distributions, causing a potential post-selection bias. In this work, we introduce post-selection distributional model evaluation (PS-DME), a general framework for statistically valid distributional model assessment after arbitrary data-dependent model pre-selection. Building on e-values, PS-DME controls post-selection false coverage rate (FCR) for the distributional KPI estimates and is proved to be more sample efficient than a baseline method based on sample splitting. Experiments on synthetic data, text-to-SQL decoding with large language models, and telecom network performance evaluation demonstrate that PS-DME enables reliable comparison of candidate configurations across a range of reliability levels, supporting the statistically reliable exploration of performance--reliability trade-offs.

large language model, machine learning, natural language, (19 more...)

2603.23055

Country: Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)

Neural Information Processing SystemsMar-14-2026, 06:58:31 GMT

0b8aff0438617c055eb55f0ba5d226fa-Supplemental.pdf

Inthis supplemental material, wefirst present thedetailed networkarchitecture andparameters of the proposed approach in Sec. A. We further provide more analysis of the proposed method and ablation studies in Sec. B. Section C shows some qualitative results for potential applications of the proposed approach on medical imaging and imaging in astronomy. Figure 6: Illustration of learned deep features.(a) The blurry input and ground truth are shown in Figure 1(a)-(b). However, on may actually wonder whether the feature extraction network acts as a denoiser, leading to the observed robustness of the proposed method to various noise levels.

artificial intelligence, blurryimage, noise level, (15 more...)

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.34)

Technology:

Information Technology > Artificial Intelligence (0.49)
Information Technology > Data Science (0.35)

Neural Information Processing SystemsMar-13-2026, 08:57:15 GMT

Memory Replay GANs: Learning to Generate New Categories without Forgetting

Chenshen Wu, Luis Herranz, Xialei Liu, yaxing wang, Joost van de Weijer, Bogdan Raducanu

In this paper we consider the case of generativemodels. In particular,we investigate generativeadversarialnetworks (GANs) in the task of learning new categories in a sequential fashion.

artificial intelligence, category, machine learning, (17 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Spain (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Louis Kirsch, Julius Kunze, David Barber

Modular Networks: Learning to Decompose Neural Computation

Neural Information Processing SystemsFeb-19-2026, 16:33:54 GMT

Scaling model capacity has been vital in the success of deep learning.

artificial intelligence, machine learning, module, (19 more...)

Country:

North America > Canada > Quebec > Montreal (0.14)
Oceania > Australia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Yulia Rubanova, Tian Qi Chen, David K. Duvenaud

Latent Ordinary Differential Equations for Irregularly-Sampled Time Series

Neural Information Processing SystemsFeb-19-2026, 12:01:06 GMT

Time series with non-uniform intervals occur in many applications, and are difficulttomodel usingstandard recurrent neural networks(RNNs).

artificial intelligence, latentode, machine learning, (18 more...)

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry: Health & Medicine (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Neural Information Processing SystemsFeb-18-2026, 23:35:43 GMT

275d7fb2fd45098ad5c3ece2ed4a2824-Paper.pdf

Conventional models prepare a large embedding matrix whose size depends on the vocabulary size. Therefore, storing these models in memory and disk storage is costly.

artificial intelligence, machine learning, natural language, (18 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.31)