Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?

Open in new window