back-propagated output error gradients; (2) A simple training algorithm, sparse in forward and
–Neural Information Processing Systems
We thank the reviewers for their feedback. Our paper will be updated to reflect the responses below. E.g., for ResNet18 on ImageNet at 50% sparsity DSG suffers an accuracy loss of 4.6%. Reviewer 2: (1) "Drastic drop due to sparse activations in forward pass": In Figure 1 we isolate the Notably, this means we use the full activation for the backward pass. Thus, STR, CS, GMP only update the active parameters. L1 response of channels is computed.
Neural Information Processing Systems
May-22-2025, 00:52:03 GMT
- Technology: