Review for NeurIPS paper: Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts

Neural Information Processing Systems 

Weaknesses: * There are numerous approaches to reduce ConvNet's memory footprint and computational resources at inference time, including but not limited to channel pruning, dynamic computational graph, and model distillation. Why is removing shortcut connection the best way to achieve the same goal? The baselines considered in Table 3 and 4 are rather lacking. For example, how does the proposed method compare to: 1. Pruning method that reduces ResNet-50 channel counts to match the memory footprint and FLOPs of plain-CNN 50. What will be the drop in accuracy?