NeurIPS_2021___sparse_train__camera_ready_

Yuxuan Xie

Neural Information Processing Systems 

The example in Figure A.1 (d) is defined as 4-entry kernel pattern, since every kernel preserves 4 non-zero weights out of the original 3 3 kernels. Besides that, the connectivity sparsity cuts the connections between some input and output channels, which is equivalent to removing corresponding whole kernels. Consider a sparse model with a sparsity ratio s 2 [0, 1] obtained from a dense model with a total of N weights. For sparse models, we need indices for denoting the sparse topology of weights/gradients within the dense model. Generally, mobile edge devices can support 8-bit fixed-point, 16-bit floating-point, and 32-bit floating-point numbers. Weights and gradients are usually using 16-bit or 32-bit. Due to the data storage format on edge devices, 8-bit or 16-bit is preferred for indices.