f7ede9414083fceab9e63d9100a80b36-Supplemental-Conference.pdf
–Neural Information Processing Systems
This pruning algorithm then assigns an importance scorekdRdwlwlk to each weight, and remove the weights receiving the lowest such scores. In Figure 8, we plot the generalization of the family of models each aforementioned algorithm generates as a function of sparsities and training time in epochs. In Section 1, We show that the augmented training algorithm produces VGG-16 models withgeneralization thatisindistinguishable fromthatofmodels thatpruning withlearning rate rewinding produces. We refer to the topK% of training examples whose training loss improves the most during pruning as thetop-improved examples. To examine the influence of these top-improved examples ongeneralization, for each sparsity pruning reaches, we train twodense models ontwo datasets respectively: a). the original training dataset excluding the top-improved examples at the specifiedsparsity,whichwedenoteasTIE(Top-ImprovedExamples);b).
Neural Information Processing Systems
Feb-12-2026, 22:13:02 GMT
- Technology: