DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs

Tan, Zhen, Dong, Daize, Zhao, Xinyu, Peng, Jie, Cheng, Yu, Chen, Tianlong

Jul-3-2024–arXiv.org Artificial Intelligence

In this paper, we introduce Dynamic Layer Operations (DLO), a novel approach for vertically scaling transformer-based Large Language Models (LLMs) by dynamically expanding, activating, or skipping layers using a sophisticated routing policy based on layerwise feature similarity. Unlike traditional Mixture-of-Experts (MoE) methods that focus on extending the model width, our approach targets model depth, addressing the redundancy observed across layer representations for various input samples. Our framework is integrated with the Supervised Fine-Tuning (SFT) stage, eliminating the need for resource-intensive Continual Pre-Training (CPT). Experimental results demonstrate that DLO not only outperforms the original unscaled models but also achieves comparable results to densely expanded models with significantly improved efficiency. Our work offers a promising direction for building efficient yet powerful LLMs. We will release our implementation and model weights upon acceptance.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jul-3-2024

arXiv.org PDF

Add feedback

Country:
- Asia
  - China (0.46)
  - Mongolia (0.28)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found