Review for NeurIPS paper: Kernel Based Progressive Distillation for Adder Neural Networks

Neural Information Processing Systems 

Weaknesses: The effectiveness of the kernel method, one of the claimed contributions, is not fully justified. As shown in Table 1, the kernel operation brings insignificant gain on CIFAR 10 with a shallower network of ResNet-20. The gains (below 0.21%) seems insignificant, which may be due to stochastic initialization of networks, suggesting that the proposed kernel scheme may not be so effective as advocated. I advised that comparison on ImageNet with a deeper network (e.g., ResNet-50) is performed. The current experiments are not strong to support that the proposed method is a competitive knowledge distillation method.