Goto

Collaborating Authors

 regularization strength


Optimal ridge regularization revisited

arXiv.org Machine Learning

We consider $L^2$-regularized linear (ridge) regression over a finite data sample $X$ with bounded covariance and linear prediction targets $y$ with additive isotropic noise of finite variance. We present an iterative procedure to compute the optimal regularization strength numerically from the generative parameters in the fixed-$X$ setting and prove its convergence at limited noise levels. Our experimental evaluation over synthetic data shows that the proposed procedure combined with sample-based parameter estimates attains near-optimal random-$X$ generalization across a wide range of sample sizes, aspect ratios, and noise levels, at an added computational cost equivalent to one preliminary ridge regression in the underparameterized regime and two in the overparameterized case.



Cardinality-Regularized Hawkes-Granger Model

Neural Information Processing Systems

This section provides parameter estimation equations in the MM procedure Eq. (13) for the baseline intensity ยตand the decay parameter ฮฒ, which were omitted in the main text due to space limitations. Below, we provide results for the exponential and power distributions. This section describes the details of the experiments. We have included the Sparse5and Dense10 data sets and the Python code to generate those as part of the final submission. B.1 Data generation Sparse5 The Sparse5 benchmark dataset is designed to have a simplest but nontrivial kind of causal structure, which is supposed to be easily reproduced by any Granger-causal learning algorithms.


Variational Garrote for Sparse Inverse Problems

arXiv.org Machine Learning

Sparse regularization plays a central role in solving inverse problems arising from incomplete or corrupted measurements. Different regularizers correspond to different prior assumptions about the structure of the unknown signal, and reconstruction performance depends on how well these priors match the intrinsic sparsity of the data. This work investigates the effect of sparsity priors in inverse problems by comparing conventional L1 regularization with the Variational Garrote (VG), a probabilistic method that approximates L0 sparsity through variational binary gating variables. A unified experimental framework is constructed across multiple reconstruction tasks including signal resampling, signal denoising, and sparse-view computed tomography. To enable consistent comparison across models with different parameterizations, regularization strength is swept across wide ranges and reconstruction behavior is analyzed through train-generalization error curves. Experiments reveal characteristic bias-variance tradeoff patterns across tasks and demonstrate that VG frequently achieves lower minimum generalization error and improved stability in strongly underdetermined regimes where accurate support recovery is critical. These results suggest that sparsity priors closer to spike-and-slab structure can provide advantages when the underlying coefficient distribution is strongly sparse. The study highlights the importance of prior-data alignment in sparse inverse problems and provides empirical insights into the behavior of variational L0-type methods across different information bottlenecks.