An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding

Neural Information Processing Systems 

Transformer-based Large Language Models (LLMs) are typically pre-trained with a fixed context window size, e.g ., 4K tokens in Touvron et al. [2023a].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found