spred: Solving $L_1$ Penalty with SGD

Jul-12-2023–arXiv.org Artificial Intelligence

We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent. Our proposal is the direct generalization of previous ideas that the $L_1$ penalty may be equivalent to a differentiable reparametrization with weight decay. We prove that the proposed method, \textit{spred}, is an exact differentiable solver of $L_1$ and that the reparametrization trick is completely ``benign" for a generic nonconvex function. Practically, we demonstrate the usefulness of the method in (1) training sparse neural networks to perform gene selection tasks, which involves finding relevant features in a very high dimensional space, and (2) neural network compression task, to which previous attempts at applying the $L_1$-penalty have been unsuccessful. Conceptually, our result bridges the gap between the sparsity in deep learning and conventional statistical learning.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Jul-12-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.91)
  - Statistical Learning > Gradient Descent (0.55)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found