YaRN: Efficient Context Window Extension of Large Language Models

Peng, Bowen, Quesnelle, Jeffrey, Fan, Honglu, Shippole, Enrico

Nov-1-2023–arXiv.org Artificial Intelligence

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps than previous methods. Using YaRN, we show that LLaMA models can effectively utilize and extrapolate to context lengths much longer than their original pre-training would allow, while also surpassing previous the state-of-the-art at context window extension. In addition, we demonstrate that YaRN exhibits the capability to extrapolate beyond the limited context of a fine-tuning dataset. The models fine-tuned using YaRN has been made available and reproduced online up to 128k context length at https://github.com/jquesnelle/yarn

efficient context window extension, language model, yarn

arXiv.org Artificial Intelligence

Nov-1-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found