AITopics

2606.18867

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Health Care Providers & Services > Reimbursement (1.00)
Health & Medicine > Government Relations & Public Policy (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsApr-30-2026, 08:54:35 GMT

f976982cd1c1b9e076c096787ef6652e-Paper-Conference.pdf

data mining, equivalence, machine learning, (19 more...)

Country: North America > United States > California (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.93)

Neural Information Processing SystemsFeb-18-2026, 01:40:01 GMT

f976982cd1c1b9e076c096787ef6652e-Paper-Conference.pdf

data mining, equivalence, machine learning, (19 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.93)

Neural Information Processing SystemsDec-27-2025, 06:19:25 GMT

Generalized equivalences between subsampling and ridge regularization

We establish precise structural and risk equivalences between subsampling and ridge regularization for ensemble ridge estimators. Specifically, we prove that linear and quadratic functionals of subsample ridge estimators, when fitted with different ridge regularization levels $\lambda$ and subsample aspect ratios $\psi$, are asymptotically equivalent along specific paths in the $(\lambda,\psi)$-plane (where $\psi$ is the ratio of the feature dimension to the subsample size). Our results only require bounded moment assumptions on feature and response distributions and allow for arbitrary joint distributions. Furthermore, we provide a data-dependent method to determine the equivalent paths of $(\lambda,\psi)$. An indirect implication of our equivalences is that optimally tuned ridge regression exhibits a monotonic prediction risk in the data aspect ratio. This resolves a recent open problem raised by Nakkiran et al. for general data distributions under proportional asymptotics, assuming a mild regularity condition that maintains regression hardness through linearized signal-to-noise ratios.

generalized equivalence, name change, ridge regularization, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Neural Information Processing SystemsDec-24-2025, 22:13:58 GMT

Implicit Regularization Paths of Weighted Neural Representations

We study the implicit regularization effects induced by (observation) weighting of pretrained features.For weight and feature matrices of bounded operator norms that are infinitesimally free with respect to (normalized) trace functionals, we derive equivalence paths connecting different weighting matrices and ridge regularization levels.Specifically, we show that ridge estimators trained on weighted features along the same path are asymptotically equivalent when evaluated against test vectors of bounded norms.These paths can be interpreted as matching the effective degrees of freedom of ridge estimators fitted with weighted features.For the special case of subsampling without replacement, our results apply to independently sampled random features and kernel features and confirm recent conjectures (Conjectures 7 and 8) of the authors on the existence of such paths in Patil and Du (2023).We also present an additive risk decomposition for ensembles of weighted estimators and show that the risks are equivalent along the paths when the ensemble size goes to infinity.As a practical consequence of the path equivalences, we develop an efficient cross-validation method for tuning and apply it to subsampled pretrained representations across several models (e.g., ResNet-50) and datasets (e.g., CIFAR-100).

artificial intelligence, implicit regularization path, machine learning, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Neural Information Processing SystemsMay-26-2025, 21:07:07 GMT

Implicit Regularization Paths of Weighted Neural Representations

We study the implicit regularization effects induced by (observation) weighting of pretrained features.For weight and feature matrices of bounded operator norms that are infinitesimally free with respect to (normalized) trace functionals, we derive equivalence paths connecting different weighting matrices and ridge regularization levels.Specifically, we show that ridge estimators trained on weighted features along the same path are asymptotically equivalent when evaluated against test vectors of bounded norms.These paths can be interpreted as matching the effective degrees of freedom of ridge estimators fitted with weighted features.For the special case of subsampling without replacement, our results apply to independently sampled random features and kernel features and confirm recent conjectures (Conjectures 7 and 8) of the authors on the existence of such paths in Patil and Du (2023).We also present an additive risk decomposition for ensembles of weighted estimators and show that the risks are equivalent along the paths when the ensemble size goes to infinity.As a practical consequence of the path equivalences, we develop an efficient cross-validation method for tuning and apply it to subsampled pretrained representations across several models (e.g., ResNet-50) and datasets (e.g., CIFAR-100).

artificial intelligence, machine learning, weighted neural representation, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.81)

arXiv.org Machine LearningMay-20-2025

Transformer learns the cross-task prior and regularization for in-context learning

Lu, Fei, Yu, Yue

Transformers have shown a remarkable ability for in-context learning (ICL), making predictions based on contextual examples. However, while theoretical analyses have explored this prediction capability, the nature of the inferred context and its utility for downstream predictions remain open questions. This paper aims to address these questions by examining ICL for inverse linear regression (ILR), where context inference can be characterized by unsupervised learning of underlying weight vectors. Focusing on the challenging scenario of rank-deficient inverse problems, where context length is smaller than the number of unknowns in the weight vectors and regularization is necessary, we introduce a linear transformer to learn the inverse mapping from contextual examples to the underlying weight vector. Our findings reveal that the transformer implicitly learns both a prior distribution and an effective regularization strategy, outperforming traditional ridge regression and regularization methods. A key insight is the necessity of low task dimensionality relative to the context length for successful learning. Furthermore, we numerically verify that the error of the transformer estimator scales linearly with the noise level, the ratio of task dimension to context length, and the condition number of the input data. These results not only demonstrate the potential of transformers for solving ill-posed inverse problems, but also provide a new perspective towards understanding the knowledge extraction mechanism within transformers.

artificial intelligence, machine learning, transformer, (16 more...)

2505.12138

Country:

North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Neural Information Processing SystemsJan-20-2025, 02:58:34 GMT

Generalized equivalences between subsampling and ridge regularization

We establish precise structural and risk equivalences between subsampling and ridge regularization for ensemble ridge estimators. Specifically, we prove that linear and quadratic functionals of subsample ridge estimators, when fitted with different ridge regularization levels \lambda and subsample aspect ratios \psi, are asymptotically equivalent along specific paths in the (\lambda,\psi) -plane (where \psi is the ratio of the feature dimension to the subsample size). Our results only require bounded moment assumptions on feature and response distributions and allow for arbitrary joint distributions. Furthermore, we provide a data-dependent method to determine the equivalent paths of (\lambda,\psi) . An indirect implication of our equivalences is that optimally tuned ridge regression exhibits a monotonic prediction risk in the data aspect ratio. This resolves a recent open problem raised by Nakkiran et al. for general data distributions under proportional asymptotics, assuming a mild regularity condition that maintains regression hardness through linearized signal-to-noise ratios.

equivalence, generalized equivalence, ridge regularization, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

arXiv.org Machine LearningDec-20-2024

Lecture Notes on High Dimensional Linear Regression

Quaini, Alberto

These lecture notes were developed for a Master's course in advanced machine learning at Erasmus University of Rotterdam. The course is designed for graduate students in mathematics, statistics and econometrics. The content follows a proposition-proof structure, making it suitable for students seeking a formal and rigorous understanding of the statistical theory underlying machine learning methods. At present, the notes focus on linear regression, with an in-depth exploration of the existence, uniqueness, relations, computation, and nonasymptotic properties of the most prominent estimators in this setting: least squares, ridgeless, ridge, and lasso. Background It is assumed that readers have a solid background in calculus, linear algebra, convex analysis, and probability theory.

artificial intelligence, estimator, machine learning, (19 more...)

2412.15633

Country:

Europe > Netherlands > South Holland > Rotterdam (0.24)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States (0.04)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education (0.68)
Energy > Power Industry (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Du, Jin-Hong, Patil, Pratik

Implicit Regularization Paths of Weighted Neural Representations

arXiv.org Machine LearningAug-28-2024

In recent years, neural networks have become state-of-the-art models for tasks in computer vision and natural language processing by learning rich representations from large datasets. Pretrained neural networks, such as ResNet, which are trained on massive datasets like ImageNet, serve as valuable resources for new, smaller datasets [32]. These pretrained models reduce computational burden and generalize well in tasks such as image classification and object detection due to their rich feature space [32, 69]. Furthermore, pretrained features or neural embeddings, such as the neural tangent kernel, extracted from these models, serve as valuable representations of diverse data [33, 66]. However, despite their usefulness, fitting models based on pretrained features on large datasets can be challenging due to computational and memory constraints. When dealing with highdimensional pretrained features and large sample sizes, direct application of even simple linear regression may be computationally infeasible or memory-prohibitive [23, 44]. To address this issue, subsampling has emerged as a practical solution that reduces the dataset size, thereby alleviating the computational and memory burden. Subsampling involves creating smaller datasets by randomly selecting a subset of the original data points.

dataset, equivalence, matrix, (15 more...)