Bootstrap Your Own Context Length

Wang, Liang, Yang, Nan, Zhang, Xingxing, Huang, Xiaolong, Wei, Furu

Dec-25-2024–arXiv.org Artificial Intelligence

We introduce a bootstrapping approach to train long-context language models by exploiting their short-context capabilities only. Our method utilizes a simple agent workflow to synthesize diverse long-context instruction tuning data, thereby eliminating the necessity for manual data collection and annotation. The proposed data synthesis workflow requires only a short-context language model, a text retriever, and a document collection, all of which are readily accessible within the open-source ecosystem. Subsequently, language models are fine-tuned using the synthesized data to extend their context lengths. In this manner, we effectively transfer the short-context capabilities of language models to long-context scenarios through a bootstrapping process. We conduct experiments with the open-source Llama-3 family of models and demonstrate that our method can successfully extend the context length to up to 1M tokens, achieving superior performance across various benchmarks.

context length, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Dec-25-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Japan (0.47)
- Europe > Austria
  - Vienna (0.15)
- North America (1.00)

Genre:
- Research Report > New Finding (0.46)
- Workflow (0.69)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.92)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found