DeepKD: ADeeply Decoupled and Denoised Knowledge Distillation Trainer

Jun-15-2026, 18:43:52 GMT–Neural Information Processing Systems

Recent advances in knowledge distillation have emphasized the importance of decoupling different knowledge components. While existing methods utilize momentum mechanisms to separate task-oriented and distillation gradients, they overlook the inherent conflict between target-class and non-target-class knowledge flows. Furthermore, low-confidence dark knowledge in non-target classes introduces noisy signals that hinder effective knowledge transfer. To address these limitations, we propose DeepKD, a novel training framework that integrates duallevel decoupling with adaptive denoising. First, through theoretical analysis of gradient signal-to-noise ratio (GSNR) characteristics in task-oriented and non-taskoriented knowledge distillation, we design independent momentum updaters for each component to prevent mutual interference. We observe that the optimal momentum coefficients for task-oriented gradient (TOG), target-class gradient (TCG), and non-target-class gradient (NCG) should be positively related to their GSNR. Second, we introduce a dynamic top-k mask (DTM) mechanism that gradually increases K from a small initial value to incorporate more non-target classes as training progresses, following curriculum learning principles. The DTM jointly filters low-confidence logits from both teacher and student models, effectively purifying dark knowledge during early training. Extensive experiments on CIFAR-100, ImageNet, and MS-COCO demonstrate DeepKD's effectiveness.

distillation, knowledge management, machine learning, (18 more...)

Neural Information Processing Systems

Jun-15-2026, 18:43:52 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.29)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Industry:
- Education > Educational Technology > Educational Software (0.34)

Technology:
- Information Technology
  - Knowledge Management (0.68)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language (1.00)
    - Representation & Reasoning > Agents (0.67)
    - Machine Learning > Neural Networks
      - Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found