Review for NeurIPS paper: Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher
–Neural Information Processing Systems
Weaknesses: Cons: - The paper assumes the student network f is overparameterized. However, there is no definition for f. There is also no comment on why you choose an overparameterized f (other than making it easier to analyze). In practice, student networks are usually small, which is mentioned in your introduction. Then, why is it meaningful to look at an overparameterized student network? More importantly, the paper assumes convergence when f is overparameterized, which is in fact not guaranteed.
Neural Information Processing Systems
Feb-7-2025, 23:37:40 GMT
- Technology: