AITopics

2605.01492

Country:

North America > United States (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Qasim, Muhammad, Javed, Farrukh

Adaptive Norm-Based Regularization for Neural Networks

arXiv.org Machine LearningMay-4-2026

In this paper, we study norm-based regularization methods for neural networks. We compare existing penalization approaches and introduce two regularization strategies that extend classical ridge- and lasso-type penalties to neural network models. The first strategy modifies weight decay by incorporating the covariance structure of the input features into a ridge-type $\ell_2$ penalty, allowing regularization to account for feature dependence. The second combines an $\ell_1$ sparsity penalty with covariance-aware $\ell_2$ regularization, producing neural network weights that are both sparse and structurally informed. Monte Carlo simulations are used to evaluate these methods under different data-generating settings, followed by two real-data applications on building cooling-load prediction and leukemia cell-type classification from high-dimensional gene expression data. Across simulated and real-data examples, the proposed regularizers improve predictive performance on unseen data and provide more effective complexity control than standard norm-based penalties, particularly when features are correlated or high-dimensional.

artificial intelligence, cov, machine learning, (17 more...)

2605.00171

Country: Europe > Sweden (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsMay-1-2026, 01:41:33 GMT

Supplementary to Smooth Bilevel Programming for Sparse Regularization Clarice Poon, Gabriel Peyré APseudocode for gradient descent implementation

Note that f(βt) = gt is computed either as in line 5 or line 9 of the algorithm and one can use these computations for any gradient based algorithm (e.g. Note also that this is simply gradient descent on a smooth function, and one can apply typical methods to choosing the stepsize γk, such as the Barzilai-Borwein stepsize [Barzilai and Borwein, 1988]. Algorithm 1: Gradient descent implementation of Ncvx-Pro for solving Lasso. 1 initialization v0 Rn (with no zero entries), stepsize γt > 0; Result: βt 2 while not converged do 3 if n6 mand λ>0 then 4 ut = diag(vt)X>Xdiag(vt) + λId To show that i) implies ii), recall that a convex, proper and lower semicontinuous function ϕ can be written in terms of its convex conjugate which has domain Rd . For the expression of ψwhen Ris a norm,from the above, we know that ψ = ( ϕ) ( z), and recall that for any norm, R(β) = maxR (w)61hw, βi. We derive some properties of the function h: Lemma 1.

algorithm, artificial intelligence, machine learning, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.81)

Eugene Belilovsky, Gaël Varoquaux, Matthew B. Blaschko

Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity

Neural Information Processing SystemsApr-30-2026, 23:09:41 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, lasso, machine learning, (18 more...)

Country: Europe (0.46)

Genre: Research Report > Experimental Study (0.95)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (0.96)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Neural Information Processing SystemsApr-24-2026, 13:55:27 GMT

PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits

In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedback, and the reward function depends linearly on a few coordinates of the covariates of the actions. This has applications in many real-world sequential decision making problems. In this paper, we propose a simple and computationally efficient sparse linear estimation method called POPART that enjoys a tighter ℓ1 recovery guarantee compared to Lasso (Tibshirani, 1996) in many problems. Our bound naturally motivates an experimental design criterion that is convex and thus computationally efficient to solve. Based on our novel estimator and design criterion, we derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art (Hao et al., 2020), especially w.r.t. the geometry of the given action set. Finally, we prove a matching lower bound for sparse linear bandits in the data-poor regime, which closes the gap between upper and lower bounds in prior work.

machine learning, popart, reinforcement learning, (16 more...)

Country: North America > United States (0.15)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Xiangyu Wang, David B. Dunson, Chenlei Leng

DECOrrelated feature space partitioning for distributed sparse regression

Neural Information Processing SystemsApr-22-2026, 07:03:00 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, subset, (17 more...)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Knight, Benjamin S., Bajaj, Ahsaas

Choosing the Right Regularizer for Applied ML: Simulation Benchmarks of Popular Scikit-learn Regularization Frameworks

arXiv.org Machine LearningApr-8-2026

This study surveys the historical development of regularization, tracing its evolution from stepwise regression in the 1960s to recent advancements in formal error control, structured penalties for non-independent features, Bayesian methods, and l0-based regularization (among other techniques). We empirically evaluate the performance of four canonical frameworks -- Ridge, Lasso, ElasticNet, and Post-Lasso OLS -- across 134,400 simulations spanning a 7-dimensional manifold grounded in eight production-grade machine learning models. Our findings demonstrate that for prediction accuracy when the sample-to-feature ratio is sufficient (n/p >= 78), Ridge, Lasso, and ElasticNet are nearly interchangeable. However, we find that Lasso recall is highly fragile under multicollinearity; at high condition numbers (kappa) and low SNR, Lasso recall collapses to 0.18 while ElasticNet maintains 0.93. Consequently, we advise practitioners against using Lasso or Post-Lasso OLS at high kappa with small sample sizes. The analysis concludes with an objective-driven decision guide to assist machine learning engineers in selecting the optimal scikit-learn-supported framework based on observable feature space attributes.

artificial intelligence, lasso, machine learning, (18 more...)

2604.03541

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Adusumilli, Karun, Kasy, Maximilian, Wilson, Ashia

From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

arXiv.org Machine LearningMar-24-2026

We derive the asymptotic risk function of regularized empirical risk minimization (ERM) estimators tuned by $n$-fold cross-validation (CV). The out-of-sample prediction loss of such estimators converges in distribution to the squared-error loss (risk function) of shrinkage estimators in the normal means model, tuned by Stein's unbiased risk estimate (SURE). This risk function provides a more fine-grained picture of predictive performance than uniform bounds on worst-case regret, which are common in learning theory: it quantifies how risk varies with the true parameter. As key intermediate steps, we show that (i) $n$-fold CV converges uniformly to SURE, and (ii) while SURE typically has multiple local minima, its global minimum is generically well separated. Well-separation ensures that uniform convergence of CV to SURE translates into convergence of the tuning parameter chosen by CV to that chosen by SURE.

artificial intelligence, estimator, machine learning, (19 more...)