Bootstrap Your Own Context Length
Wang, Liang, Yang, Nan, Zhang, Xingxing, Huang, Xiaolong, Wei, Furu
–arXiv.org Artificial Intelligence
We introduce a bootstrapping approach to train long-context language models by exploiting their short-context capabilities only. Our method utilizes a simple agent workflow to synthesize diverse long-context instruction tuning data, thereby eliminating the necessity for manual data collection and annotation. The proposed data synthesis workflow requires only a short-context language model, a text retriever, and a document collection, all of which are readily accessible within the open-source ecosystem. Subsequently, language models are fine-tuned using the synthesized data to extend their context lengths. In this manner, we effectively transfer the short-context capabilities of language models to long-context scenarios through a bootstrapping process. We conduct experiments with the open-source Llama-3 family of models and demonstrate that our method can successfully extend the context length to up to 1M tokens, achieving superior performance across various benchmarks.
arXiv.org Artificial Intelligence
Dec-25-2024
- Country:
- Asia
- China > Guangxi Province
- Nanning (0.04)
- Indonesia > Bali (0.04)
- Japan
- Honshū
- Kansai > Kyoto Prefecture
- Kyoto (0.04)
- Kantō > Tokyo Metropolis Prefecture
- Tokyo (0.04)
- Kansai > Kyoto Prefecture
- Kyūshū & Okinawa > Okinawa (0.04)
- Honshū
- Middle East > Jordan (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Singapore (0.04)
- China > Guangxi Province
- Europe > Austria
- Vienna (0.15)
- North America
- Canada > Ontario
- Toronto (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California > San Francisco County
- San Francisco (0.04)
- Tennessee (0.04)
- California > San Francisco County
- Canada > Ontario
- Oceania > Australia
- New South Wales (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.46)
- Workflow (0.69)
- Industry:
- Education (0.46)
- Technology: