Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models

Neural Information Processing Systems 

Traditional knowledge distillation (KD) methods manually design student architectures to compress large models given pre-specified computational cost.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found