Review for NeurIPS paper: Kernel Based Progressive Distillation for Adder Neural Networks

Jan-26-2025, 13:44:52 GMT–Neural Information Processing Systems

Weaknesses: The effectiveness of the kernel method, one of the claimed contributions, is not fully justified. As shown in Table 1, the kernel operation brings insignificant gain on CIFAR 10 with a shallower network of ResNet-20. The gains (below 0.21%) seems insignificant, which may be due to stochastic initialization of networks, suggesting that the proposed kernel scheme may not be so effective as advocated. I advised that comparison on ImageNet with a deeper network (e.g., ResNet-50) is performed. The current experiments are not strong to support that the proposed method is a competitive knowledge distillation method.

adder neural network, distillation, progressive distillation, (12 more...)

Neural Information Processing Systems

Jan-26-2025, 13:44:52 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.42)