DDK: Distilling Domain Knowledge for Efficient Large Language Models

Mar-22-2026, 03:02:24 GMT–Neural Information Processing Systems

Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve the performance of a smaller LLM (i.e., the student model) by transferring knowledge from a high-performing LLM (i.e., the teacher model). Prevailing techniques in LLM distillation typically use a black-box model API to generate high-quality pretrained and aligned datasets, or utilize white-box distillation by altering the loss function to better transfer knowledge from the teacher LLM. However, these methods ignore the knowledge differences between the student and teacher LLMs across domains.

artificial intelligence, large language model, natural language, (10 more...)

Neural Information Processing Systems

Mar-22-2026, 03:02:24 GMT

Conferences Web Page

Add feedback

Industry:
- Education (0.82)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)