Distillation Algorithms for Knowledge Distillation
In continuation with Knowledge Distillation series, this is the third blog post where I discuss the distillation algorithms for knowledge distillation. For better context, please read the first blog post on Knowledge distillation here. For knowledge distillation, the teacher-student architecture forms the generic carrier for knowledge transfer. The quality of knowledge acquisition and distillation from teacher to student is determined based on the design of the architecture. Earlier, knowledge distillation was designed to compress an ensemble of deep neural networks. The complexity of deep neural networks comes from two dimensions: the depth and width of the neural network.
Jan-7-2023, 23:20:47 GMT
- Technology: