Appendix for Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation
–Neural Information Processing Systems
A.1 Assumptions We provide grounds for assumptions in this section. Assumption 1 is a fine-grained version of the standard Lipschitz smoothness, i.e., We follow [1] to make this assumption. Assumption 2 combines the assumption of the unbiased gradient and bounded variance, which is also standard in literature. This is because 1. z All the reported results are the top1 accuracy (%). The memory footprint of the training at a batch size of 128 (compute in CIFAR100) is reported in the second column.
Neural Information Processing Systems
Mar-27-2025, 14:36:38 GMT