AITopics | regularization strength

Collaborating Authors

regularization strength

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimal Rates in Continual Linear Regression via Increasing Regularization

Neural Information Processing SystemsJun-17-2026, 10:42:41 GMT

We study realizable continual linear regression under random task orderings, a common setting for developing continual learning theory. In this setup, the worstcase expected loss after k learning iterations admits a lower bound of Ω(1/k). However, prior work using an unregularized scheme has only established an upper bound of O(1/k1/4), leaving a significant gap. Our paper proves that this gap can be narrowed, or even closed, using two frequently used regularization schemes: (1) explicit isotropic ℓ2 regularization, and (2) implicit regularization via finite step budgets. We show that these approaches, which are used in practice to mitigate forgetting, reduce to stochastic gradient descent (SGD) on carefully defined surrogate losses. Through this lens, we identify a fixed regularization strength that yields a near-optimal rate of O(logk/k). Moreover, formalizing and analyzing a generalized variant of SGD for time-varying functions, we derive an increasing regularization strength schedule that provably achieves an optimal rate of O(1/k). This suggests that schedules that increase the regularization coefficient or decrease the number of steps per task are beneficial, at least in the worst case.

artificial intelligence, machine learning, regularization, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Optimal ridge regularization revisited

Timmermans, Jack, Alvarez, Sergio A.

arXiv.org Machine LearningMay-28-2026

We consider $L^2$-regularized linear (ridge) regression over a finite data sample $X$ with bounded covariance and linear prediction targets $y$ with additive isotropic noise of finite variance. We present an iterative procedure to compute the optimal regularization strength numerically from the generative parameters in the fixed-$X$ setting and prove its convergence at limited noise levels. Our experimental evaluation over synthetic data shows that the proposed procedure combined with sample-based parameter estimates attains near-optimal random-$X$ generalization across a wide range of sample sizes, aspect ratios, and noise levels, at an added computational cost equivalent to one preliminary ridge regression in the underparameterized regime and two in the overparameterized case.

artificial intelligence, machine learning, regularization, (16 more...)

arXiv.org Machine Learning

2605.28679

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

4b26dc4663ccf960c8538d595d0a1d3a-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 18:37:20 GMT

explanation, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Cardinality-Regularized Hawkes-Granger Model

Neural Information Processing SystemsApr-24-2026, 20:32:47 GMT

This section provides parameter estimation equations in the MM procedure Eq. (13) for the baseline intensity µand the decay parameter β, which were omitted in the main text due to space limitations. Below, we provide results for the exponential and power distributions. This section describes the details of the experiments. We have included the Sparse5and Dense10 data sets and the Python code to generate those as part of the final submission. B.1 Data generation Sparse5 The Sparse5 benchmark dataset is designed to have a simplest but nontrivial kind of causal structure, which is supposed to be easily reproduced by any Granger-causal learning algorithms.

accuracy, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

Variational Garrote for Sparse Inverse Problems

Lee, Kanghun, Soh, Hyungjoon, Jo, Junghyo

arXiv.org Machine LearningMar-16-2026

Sparse regularization plays a central role in solving inverse problems arising from incomplete or corrupted measurements. Different regularizers correspond to different prior assumptions about the structure of the unknown signal, and reconstruction performance depends on how well these priors match the intrinsic sparsity of the data. This work investigates the effect of sparsity priors in inverse problems by comparing conventional L1 regularization with the Variational Garrote (VG), a probabilistic method that approximates L0 sparsity through variational binary gating variables. A unified experimental framework is constructed across multiple reconstruction tasks including signal resampling, signal denoising, and sparse-view computed tomography. To enable consistent comparison across models with different parameterizations, regularization strength is swept across wide ranges and reconstruction behavior is analyzed through train-generalization error curves. Experiments reveal characteristic bias-variance tradeoff patterns across tasks and demonstrate that VG frequently achieves lower minimum generalization error and improved stability in strongly underdetermined regimes where accurate support recovery is critical. These results suggest that sparsity priors closer to spike-and-slab structure can provide advantages when the underlying coefficient distribution is strongly sparse. The study highlights the importance of prior-data alignment in sparse inverse problems and provides empirical insights into the behavior of variational L0-type methods across different information bottlenecks.

artificial intelligence, data quality, machine learning, (18 more...)

arXiv.org Machine Learning

2603.12562

Country: