Accelerating CNN Training by Sparsifying Activation Gradients