Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
–Neural Information Processing Systems
Traditional knowledge distillation (KD) methods manually design student architectures to compress large models given pre-specified computational cost.
Neural Information Processing Systems
Aug-18-2025, 04:10:25 GMT
- Country:
- Asia > Singapore (0.04)
- Oceania > Australia
- North America > United States
- Washington > King County
- Seattle (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > San Diego County
- San Diego (0.04)
- Washington > King County
- Europe
- Technology: