spred: Solving $L_1$ Penalty with SGD
–arXiv.org Artificial Intelligence
We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent. Our proposal is the direct generalization of previous ideas that the $L_1$ penalty may be equivalent to a differentiable reparametrization with weight decay. We prove that the proposed method, \textit{spred}, is an exact differentiable solver of $L_1$ and that the reparametrization trick is completely ``benign" for a generic nonconvex function. Practically, we demonstrate the usefulness of the method in (1) training sparse neural networks to perform gene selection tasks, which involves finding relevant features in a very high dimensional space, and (2) neural network compression task, to which previous attempts at applying the $L_1$-penalty have been unsuccessful. Conceptually, our result bridges the gap between the sparsity in deep learning and conventional statistical learning.
arXiv.org Artificial Intelligence
Jul-12-2023
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Health & Medicine > Therapeutic Area > Oncology (0.68)
- Technology: