Long Context Alignment with Short Instructions and Synthesized Positions

Wu, Wenhao, Wang, Yizhong, Fu, Yao, Yue, Xiang, Zhu, Dawei, Li, Sujian

May-6-2024–arXiv.org Artificial Intelligence

Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources. This paper introduces Step-Skipping Alignment (SkipAlign), a new technique designed to enhance the long-context capabilities of LLMs in the phase of alignment without the need for additional efforts beyond training with original data length. SkipAlign is developed on the premise that long-range dependencies are fundamental to enhancing an LLM's capacity of long context. Departing from merely expanding the length of input samples, SkipAlign synthesizes long-range dependencies from the aspect of positions indices. This is achieved by the strategic insertion of skipped positions within instruction-following samples, which utilizes the semantic structure of the data to effectively expand the context. Through extensive experiments on base models with a variety of context window sizes, SkipAlign demonstrates its effectiveness across a spectrum of long-context tasks. Particularly noteworthy is that with a careful selection of the base model and alignment datasets, SkipAlign with only 6B parameters achieves it's best performance and comparable with strong baselines like GPT-3.5-Turbo-16K on LongBench.

positional index, relative distance, skipalign, (13 more...)

arXiv.org Artificial Intelligence

May-6-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Ohio (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Asia > Myanmar
  - Tanintharyi Region > Dawei (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found