AITopics

2605.09757

Country: Europe > Germany (0.46)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Li, Ziyan, Hiratani, Naoki

Characterizing and Correcting Effective Target Shift in Online Learning

arXiv.org Machine LearningMay-11-2026

Online learning from a stream of data is a defining feature of intelligence, yet modern machine learning systems often struggle in this setting, especially under distributional shift. To understand its basic properties, we study the relationship between online and offline learning in the context of kernel regression. We derive a closed-form expression for the function learned by online kernel regression, revealing that online kernel regression is equivalent to offline regression with shifted, inaccurate target outputs. Conversely, we show that by compensating for this effective shift in the teaching signal through target correction, online kernel-based learning can provably learn the same predictor as its offline counterpart. We derive both a closed-form expression for this target correction and an iterative form that can be applied sequentially. Applying this framework to image classification tasks on CIFAR-10 and CORe50, we show that online stochastic gradient descent with iteratively corrected targets outperforms learning with the true targets in continual learning settings. This work therefore provides a basic framework for analyzing and improving online learning in non-stationary environments.

artificial intelligence, deep learning, machine learning, (17 more...)

2605.07886

Country: North America (0.46)

Genre: Research Report (0.64)

Industry:

Education > Educational Setting > Online (0.84)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningApr-28-2026

Learning Curves and Benign Overfitting of Spectral Algorithms in Large Dimensions

Lu, Weihao, Lin, Qian, Xia, Yingcun, Huang, Dongming

Existing large-dimensional theory for spectral algorithms resolves either the optimally tuned point or the interpolation limit, but leaves the under-regularized regime unexplored. We study the learning curve and benign overfitting of spectral algorithms in the largedimensional setting where the sample size and dimension are of comparable order, i.e., n dγ for some γ > 0. We first consider inner-product kernels on the sphere Sd 1 and establish a sharp asymptotic characterization of the excess risk across the full regularization path under various source conditions s 0, where smeasures the relative smoothness of the regression function. Our results reveal that the learning curve is not simply U-shaped but instead consists of three distinct regimes: over-regularized, under-regularized, and interpolation regimes. This characterization allows us to fully capture the benign overfitting phenomenon, demonstrating that benign overfitting arises consistently across both the under-regularized and interpolation regimes whenever sis positive but no larger than a critical threshold. We further show that, in the sufficiently regularized regime, the kernel learning curve is recovered by an associated sequence model. Finally, we extend the learning-curve analysis to large-dimensional KRR for a class of kernels on general domains in Rd whose low-degree eigenspaces satisfy spectral-scaling and hyper-contractivity conditions. Keywords: Spectral algorithms, learning curves, high dimension, benign overfitting. 1 Introduction Nonparametric regression studies the estimation of an unknown function f: Rd R from ni.i.d.

artificial intelligence, learning curve, machine learning, (18 more...)

2604.23212

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Neural Information Processing SystemsApr-25-2026, 20:19:31 GMT

Locality defeats the curse of dimensionality in convolutional teacher-student scenarios

Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher-student framework for kernel regression, using'convolutional' kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using heuristic methods from physics, we find in the ridgeless case that locality is key in determining the learning curve exponent β (that relates the test error t P β to the size of the training set P), whereas translational invariance is not. In particular, if the filter size of the teacher tis smaller than that of the student s, β is a function of s only and does not depend on the input dimension. We confirm our predictions on β empirically. We conclude by proving, under a natural universality assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.

artificial intelligence, deep learning, machine learning, (18 more...)

Country: North America > United States (0.46)

Industry: Education (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Neural Information Processing SystemsApr-24-2026, 23:33:53 GMT

Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression

As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes. Currently, theoretical understanding of the learning curves that characterize how the prediction error depends on the number of samples is restricted to either largesample asymptotics (m!1) or, for certain simple data distributions, to the high-dimensional asymptotics in which the number of samples scales linearly with the dimension (m / d). There is a wide gulf between these two regimes, including all higher-order scaling relations m / dr, which are the subject of the present paper. We focus on the problem of kernel ridge regression for dot-product kernels and present precise formulas for the mean of the test error, bias, and variance, for data drawn uniformly from the sphere with isotropic random labels in the rth-order asymptotic scaling regime m!1 with m/dr held constant. We observe a peak in the learning curve whenever m dr/r! for any integer r, leading to multiple sample-wise descent and nontrivial behavior at multiple scales. We include a colab2 notebook that reproduces the essential results of the paper.

artificial intelligence, kernel, machine learning, (16 more...)

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Neural Information Processing SystemsApr-22-2026, 00:03:38 GMT

Error Analysis of Generalized Nyström Kernel Regression

Hong Chen, Haifeng Xia, Heng Huang, Weidong Cai

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, regression, (17 more...)

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Farné, Gabriele, Boncoraglio, Fabrizio, Zdeborová, Lenka

The Rules-and-Facts Model for Simultaneous Generalization and Memorization in Neural Networks

arXiv.org Machine LearningMar-27-2026

A key capability of modern neural networks is their capacity to simultaneously learn underlying rules and memorize specific facts or exceptions. Yet, theoretical understanding of this dual capability remains limited. We introduce the Rules-and-Facts (RAF) model, a minimal solvable setting that enables precise characterization of this phenomenon by bridging two classical lines of work in the statistical physics of learning: the teacher-student framework for generalization and Gardner-style capacity analysis for memorization. In the RAF model, a fraction $1 - \varepsilon$ of training labels is generated by a structured teacher rule, while a fraction $\varepsilon$ consists of unstructured facts with random labels. We characterize when the learner can simultaneously recover the underlying rule - allowing generalization to new data - and memorize the unstructured examples. Our results quantify how overparameterization enables the simultaneous realization of these two objectives: sufficient excess capacity supports memorization, while regularization and the choice of kernel or nonlinearity control the allocation of capacity between rule learning and memorization. The RAF model provides a theoretical foundation for understanding how modern neural networks can infer structure while storing rare or non-compressible information.

artificial intelligence, generalization error, machine learning, (18 more...)

2603.25579

Country:

North America (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.33)

Industry:

Health & Medicine (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Neural Information Processing SystemsMar-17-2026, 18:55:00 GMT

Generative Local Metric Learning for Kernel Regression

This paper shows how metric learning can be used with Nadaraya-Watson (NW) kernel regression. Compared with standard approaches, such as bandwidth selection, we show how metric learning can significantly reduce the mean square error (MSE) in kernel regression, particularly for high-dimensional data. We propose a method for efficiently learning a good metric function based upon analyzing the performance of the NW estimator for Gaussian-distributed data. A key feature of our approach is that the NW estimator with a learned metric uses information from both the global and local structure of the training data. Theoretical and empirical results confirm that the learned metric can considerably reduce the bias and MSE for kernel regression even when the data are not confined to Gaussian.

artificial intelligence, machine learning, proceedings, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)