neural network pruning
fdc42b6b0ee16a2f866281508ef56730-Supplemental.pdf
To estimate the impact of removing a parameter, these methods often use importance measures that were originally designed to prune neural networks. If this hypothesis is true, it has great potential to covert the inefficient training process on a large network to the scalable training process over a small one with comparable test accuracy. Most of existing LTH techniques provide empirical evidence to verify the LTH, although these methods raise very intriguing observations [71, 12, 1, 47, 69, 54, 5, 53, 26, 8, 7, 11]. However, multiple cycles of training and pruning over large neural networks are time-consuming. Tworecent worksanalyze the LTH transferability, i.e., the ticket discovered from one source task can be transferred to another targettask[44,43].
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.05)
- North America > Canada > British Columbia > Vancouver (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (12 more...)
Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm
Pruning techniques have been successfully used in neural networks to trade accuracy for sparsity. However, the impact of network pruning is not uniform: prior work has shown that the recall for underrepresented classes in a dataset may be more negatively affected. In this work, we study such relative distortions in recall by hypothesizing an intensification effect that is inherent to the model. Namely, that pruning makes recall relatively worse for a class with recall below accuracy and, conversely, that it makes recall relatively better for a class with recall above accuracy. In addition, we propose a new pruning algorithm aimed at attenuating such effect. Through statistical analysis, we have observed that intensification is less severe with our algorithm but nevertheless more pronounced with relatively more difficult tasks, less complex models, and higher pruning ratios. More surprisingly, we conversely observe a de-intensification effect with lower pruning ratios.
The Generalization-Stability Tradeoff In Neural Network Pruning
Pruning neural network parameters is often viewed as a means to compress models, but pruning has also been motivated by the desire to prevent overfitting. This motivation is particularly relevant given the perhaps surprising observation that a wide variety of pruning approaches increase test accuracy despite sometimes massive reductions in parameter counts. To better understand this phenomenon, we analyze the behavior of pruning over the course of training, finding that pruning's benefit to generalization increases with pruning's instability (defined as the drop in test accuracy immediately following pruning). We demonstrate that this generalization-stability tradeoff'' is present across a wide variety of pruning settings and propose a mechanism for its cause: pruning regularizes similarly to noise injection. Supporting this, we find less pruning stability leads to more model flatness and the benefits of pruning do not depend on permanent parameter removal. These results explain the compatibility of pruning-based generalization improvements and the high generalization recently observed in overparameterized networks.
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.05)
- North America > Canada > British Columbia > Vancouver (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (12 more...)
Review for NeurIPS paper: The Generalization-Stability Tradeoff In Neural Network Pruning
Weaknesses: My major concern is around the experimental settings, which are somewhat artificial in my opinion, and thus make me question the generality of their approach. In particular, I would like to see additional experiments around the following aspects. They don't use weight regularization and only show results using Adam. While I understand the reasoning for this choice and it is probably important in order to amplify the effect of their observation, I would appreciate additional experiments using standard training pipelines, including dropout, data augmentation, and weight regularization. For the same reason as above, it makes me question the general applicability of their observations.
Review for NeurIPS paper: The Generalization-Stability Tradeoff In Neural Network Pruning
The paper studies the effect of pruning on the generalization ability of neural networks. It introduces a notion of pruning instability (determines the closeness to the original function, or the drop in accuracy after pruning) and show that instability relates positively to generalization of neural networks. The paper is purely empirical and while the reviewers initially had some concerns regarding the choice of architectures, hyperparameters and datasets, some of these concerns were properly addressed in the rebuttal. Overall, the paper introduces an interesting view on pruning which is backed up to a large extent by their experimental results. The reviewers agree that some aspects could be improved and have made many suggestions. I recommend acceptance but I also strongly encourage the authors to revise the paper according to the reviews to maximize its potential impact.
Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm
Pruning techniques have been successfully used in neural networks to trade accuracy for sparsity. However, the impact of network pruning is not uniform: prior work has shown that the recall for underrepresented classes in a dataset may be more negatively affected. In this work, we study such relative distortions in recall by hypothesizing an intensification effect that is inherent to the model. Namely, that pruning makes recall relatively worse for a class with recall below accuracy and, conversely, that it makes recall relatively better for a class with recall above accuracy. In addition, we propose a new pruning algorithm aimed at attenuating such effect.
The Generalization-Stability Tradeoff In Neural Network Pruning
Pruning neural network parameters is often viewed as a means to compress models, but pruning has also been motivated by the desire to prevent overfitting. This motivation is particularly relevant given the perhaps surprising observation that a wide variety of pruning approaches increase test accuracy despite sometimes massive reductions in parameter counts. To better understand this phenomenon, we analyze the behavior of pruning over the course of training, finding that pruning's benefit to generalization increases with pruning's instability (defined as the drop in test accuracy immediately following pruning). We demonstrate that this "generalization-stability tradeoff'' is present across a wide variety of pruning settings and propose a mechanism for its cause: pruning regularizes similarly to noise injection. Supporting this, we find less pruning stability leads to more model flatness and the benefits of pruning do not depend on permanent parameter removal.