LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention

Kolomeitsev, Konstantin

arXiv.org Artificial Intelligence 

Large language models (LLMs) have demonstrated outstanding performance in natural language processing tasks; however, their training and deployment require significant computational resources. This has led to the need for methods that transfer knowledge from large pre-trained models to smaller models. Such approaches are especially relevant for applied tasks with limited computational resources. In this work, we propose a modular LLM architecture in which a large model serves as a knowledge source, while a smaller model receives external representations via Enhanced Cross-Attention and generates responses. This method significantly reduces training costs while remaining effective for solving specific business tasks.