iht
- North America > United States > Illinois (0.05)
- North America > United States > Texas (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- (4 more...)
Learning Sparse Distributions using Iterative Hard Thresholding
Iterative hard thresholding (IHT) is a projected gradient descent algorithm, known to achieve state of the art performance for a wide range of structured estimation problems, such as sparse inference. In this work, we consider IHT as a solution to the problem of learning sparse discrete distributions. We study the hardness of using IHT on the space of measures. As a practical alternative, we propose a greedy approximate projection which simultaneously captures appropriate notions of sparsity in distributions, while satisfying the simplex constraint, and investigate the convergence behavior of the resulting procedure in various settings. Our results show, both in theory and practice, that IHT can achieve state of the art results for learning sparse distributions.
- North America > United States > Illinois (0.05)
- North America > United States > Texas (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- (4 more...)
would like to address all concerns raised
We would like to thank all of the reviewers for their valuable time and their constructive comments. We will incorporate the proposed minor corrections in the final version of the paper. On whether support set changes during iterations, we observe that in experiments (subsection 4.1) IHT changes support, We thank the reviewer for the supportive and constructive review. Regarding the comment in lines 198-202, we apologize for any confusion. Regarding variance in experiments, we have observed high variance is not enough for the algorithm to get "lucky".
A Recovery Guarantee for Sparse Neural Networks
Fridovich-Keil, Sara, Pilanci, Mert
We prove the first guarantees of sparse recovery for ReLU neural networks, where the sparse network weights constitute the signal to be recovered. Specifically, we study structural properties of the sparse network weights for two-layer, scalar-output networks under which a simple iterative hard thresholding algorithm recovers these weights exactly, using memory that grows linearly in the number of nonzero weights. We validate this theoretical result with simple experiments on recovery of sparse planted MLPs, MNIST classification, and implicit neural representations. Experimentally, we find performance that is competitive with, and often exceeds, a high-performing but memory-inefficient baseline based on iterative magnitude pruning.
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
Learning Sparse Distributions using Iterative Hard Thresholding
Iterative hard thresholding (IHT) is a projected gradient descent algorithm, known to achieve state of the art performance for a wide range of structured estimation problems, such as sparse inference. In this work, we consider IHT as a solution to the problem of learning sparse discrete distributions. We study the hardness of using IHT on the space of measures. As a practical alternative, we propose a greedy approximate projection which simultaneously captures appropriate notions of sparsity in distributions, while satisfying the simplex constraint, and investigate the convergence behavior of the resulting procedure in various settings. Our results show, both in theory and practice, that IHT can achieve state of the art results for learning sparse distributions.
Deep greedy unfolding: Sorting out argsorting in greedy sparse recovery algorithms
Mohammad-Taheri, Sina, Colbrook, Matthew J., Brugiapaglia, Simone
Gradient-based learning imposes (deep) neural networks to be differentiable at all steps. This includes model-based architectures constructed by unrolling iterations of an iterative algorithm onto layers of a neural network, known as algorithm unrolling. However, greedy sparse recovery algorithms depend on the non-differentiable argsort operator, which hinders their integration into neural networks. In this paper, we address this challenge in Orthogonal Matching Pursuit (OMP) and Iterative Hard Thresholding (IHT), two popular representative algorithms in this class. We propose permutation-based variants of these algorithms and approximate permutation matrices using "soft" permutation matrices derived from softsort, a continuous relaxation of argsort. We demonstrate -- both theoretically and numerically -- that Soft-OMP and Soft-IHT, as differentiable counterparts of OMP and IHT and fully compatible with neural network training, effectively approximate these algorithms with a controllable degree of accuracy. This leads to the development of OMP- and IHT-Net, fully trainable network architectures based on Soft-OMP and Soft-IHT, respectively. Finally, by choosing weights as "structure-aware" trainable parameters, we connect our approach to structured sparse recovery and demonstrate its ability to extract latent sparsity patterns from data.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.46)
- North America > Canada (0.14)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York > New York County > New York City (0.04)