Review for NeurIPS paper: Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Feb-7-2025, 23:37:40 GMT–Neural Information Processing Systems

Weaknesses: Cons: - The paper assumes the student network f is overparameterized. However, there is no definition for f. There is also no comment on why you choose an overparameterized f (other than making it easier to analyze). In practice, student networks are usually small, which is mentioned in your introduction. Then, why is it meaningful to look at an overparameterized student network? More importantly, the paper assumes convergence when f is overparameterized, which is in fact not guaranteed.

data efficiency and imperfect teacher, knowledge distillation, wide neural network, (6 more...)

Neural Information Processing Systems

Feb-7-2025, 23:37:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)