3a01fc0853ebeba94fde4d1cc6fb842a-AuthorFeedback.pdf

Neural Information Processing Systems 

"Training efficiency comparison": 1) Because splitting GD and pruning methods work in a very different fashion, it is We will release our implementation. We will add more discussion on the time efficiency in the revision. We will release our code to demonstrate this after acceptance. For example, a 3 3 Conv-filter with 64 input channels has d = 64 3 3 = 576. We will release our implementation.