lasso
Stabilizing Private LASSO under Heterogeneous Covariates via Anisotropic Objective Perturbation
Tanzawa, Haruka, Sakata, Ayaka
We study high-dimensional LASSO under differential privacy via objective perturbation with heterogeneous covariate scales. In practical scenarios, covariates often exhibit diverse scales; however, standard preprocessing is problematic under privacy constraints, as it consumes additional privacy budget. This heterogeneity induces effective anisotropy in the objective perturbation via the inverse Gram matrix of covariates, which can degrade the stability and accuracy of algorithms. To address this, we propose a Gram-based anisotropic objective perturbation, a ``pre-distortion" strategy that counteracts the distortion from the covariate structure to restore isotropy in the estimation process. Using an Approximate Message Passing (AMP) framework and state evolution analysis, we demonstrate that our proposed perturbation significantly stabilizes convergence and improves both statistical efficiency and privacy performance compared to standard uniform noise injection. Our results provide theoretical insights into designing stable and efficient private estimators without relying on data-dependent preprocessing.
Adaptive Norm-Based Regularization for Neural Networks
Qasim, Muhammad, Javed, Farrukh
In this paper, we study norm-based regularization methods for neural networks. We compare existing penalization approaches and introduce two regularization strategies that extend classical ridge- and lasso-type penalties to neural network models. The first strategy modifies weight decay by incorporating the covariance structure of the input features into a ridge-type $\ell_2$ penalty, allowing regularization to account for feature dependence. The second combines an $\ell_1$ sparsity penalty with covariance-aware $\ell_2$ regularization, producing neural network weights that are both sparse and structurally informed. Monte Carlo simulations are used to evaluate these methods under different data-generating settings, followed by two real-data applications on building cooling-load prediction and leukemia cell-type classification from high-dimensional gene expression data. Across simulated and real-data examples, the proposed regularizers improve predictive performance on unseen data and provide more effective complexity control than standard norm-based penalties, particularly when features are correlated or high-dimensional.
Supplementary to Smooth Bilevel Programming for Sparse Regularization Clarice Poon, Gabriel Peyrรฉ APseudocode for gradient descent implementation
Note that f(ฮฒt) = gt is computed either as in line 5 or line 9 of the algorithm and one can use these computations for any gradient based algorithm (e.g. Note also that this is simply gradient descent on a smooth function, and one can apply typical methods to choosing the stepsize ฮณk, such as the Barzilai-Borwein stepsize [Barzilai and Borwein, 1988]. Algorithm 1: Gradient descent implementation of Ncvx-Pro for solving Lasso. 1 initialization v0 Rn (with no zero entries), stepsize ฮณt > 0; Result: ฮฒt 2 while not converged do 3 if n6 mand ฮป>0 then 4 ut = diag(vt)X>Xdiag(vt) + ฮปId To show that i) implies ii), recall that a convex, proper and lower semicontinuous function ฯ can be written in terms of its convex conjugate which has domain Rd . For the expression of ฯwhen Ris a norm,from the above, we know that ฯ = ( ฯ) ( z), and recall that for any norm, R(ฮฒ) = maxR (w)61hw, ฮฒi. We derive some properties of the function h: Lemma 1.
PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits
In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedback, and the reward function depends linearly on a few coordinates of the covariates of the actions. This has applications in many real-world sequential decision making problems. In this paper, we propose a simple and computationally efficient sparse linear estimation method called POPART that enjoys a tighter โ1 recovery guarantee compared to Lasso (Tibshirani, 1996) in many problems. Our bound naturally motivates an experimental design criterion that is convex and thus computationally efficient to solve. Based on our novel estimator and design criterion, we derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art (Hao et al., 2020), especially w.r.t. the geometry of the given action set. Finally, we prove a matching lower bound for sparse linear bandits in the data-poor regime, which closes the gap between upper and lower bounds in prior work.
Choosing the Right Regularizer for Applied ML: Simulation Benchmarks of Popular Scikit-learn Regularization Frameworks
Knight, Benjamin S., Bajaj, Ahsaas
This study surveys the historical development of regularization, tracing its evolution from stepwise regression in the 1960s to recent advancements in formal error control, structured penalties for non-independent features, Bayesian methods, and l0-based regularization (among other techniques). We empirically evaluate the performance of four canonical frameworks -- Ridge, Lasso, ElasticNet, and Post-Lasso OLS -- across 134,400 simulations spanning a 7-dimensional manifold grounded in eight production-grade machine learning models. Our findings demonstrate that for prediction accuracy when the sample-to-feature ratio is sufficient (n/p >= 78), Ridge, Lasso, and ElasticNet are nearly interchangeable. However, we find that Lasso recall is highly fragile under multicollinearity; at high condition numbers (kappa) and low SNR, Lasso recall collapses to 0.18 while ElasticNet maintains 0.93. Consequently, we advise practitioners against using Lasso or Post-Lasso OLS at high kappa with small sample sizes. The analysis concludes with an objective-driven decision guide to assist machine learning engineers in selecting the optimal scikit-learn-supported framework based on observable feature space attributes.
From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators
Adusumilli, Karun, Kasy, Maximilian, Wilson, Ashia
We derive the asymptotic risk function of regularized empirical risk minimization (ERM) estimators tuned by $n$-fold cross-validation (CV). The out-of-sample prediction loss of such estimators converges in distribution to the squared-error loss (risk function) of shrinkage estimators in the normal means model, tuned by Stein's unbiased risk estimate (SURE). This risk function provides a more fine-grained picture of predictive performance than uniform bounds on worst-case regret, which are common in learning theory: it quantifies how risk varies with the true parameter. As key intermediate steps, we show that (i) $n$-fold CV converges uniformly to SURE, and (ii) while SURE typically has multiple local minima, its global minimum is generically well separated. Well-separation ensures that uniform convergence of CV to SURE translates into convergence of the tuning parameter chosen by CV to that chosen by SURE.