LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Zhang, Renrui, Han, Jiaming, Liu, Chris, Gao, Peng, Zhou, Aojun, Hu, Xiangfei, Yan, Shilin, Lu, Pan, Li, Hongsheng, Qiao, Yu
–arXiv.org Artificial Intelligence
Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the word tokens at higher transformer layers. Then, a zero-initialized attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With our efficient training, LLaMA-Adapter can generate high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Besides language commands, our approach can be simply extended to multi-modal instructions for learning image-conditioned LLaMA model, which achieves superior reasoning performance on ScienceQA and COCO Caption benchmarks. Furthermore, we also evaluate the zero-initialized attention mechanism for fine-tuning other pre-trained models (ViT, RoBERTa) on traditional vision and language tasks, demonstrating the superior generalization capacity of our approach.
arXiv.org Artificial Intelligence
Jun-14-2023
- Country:
- South America
- North America
- United States > California
- Los Angeles County > Los Angeles (0.14)
- Mexico > Mexico City
- Mexico City (0.04)
- Canada
- British Columbia (0.04)
- Prince Edward Island (0.04)
- Newfoundland and Labrador > Labrador (0.04)
- Alberta (0.04)
- Manitoba (0.04)
- Ontario (0.04)
- Nova Scotia (0.04)
- Saskatchewan (0.04)
- Quebec (0.04)
- Nunavut (0.04)
- United States > California
- Europe > Romania
- Asia > China
- Africa > Middle East
- Egypt (0.04)
- Genre:
- Research Report (0.50)
- Personal (0.47)
- Industry:
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Natural Language
- Large Language Model (1.00)
- Chatbot (0.95)
- Text Processing (0.88)
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Information Technology > Artificial Intelligence