Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models
–arXiv.org Artificial Intelligence
The development of large language models leads to the formation of a pre-train-then-align paradigm, in which the model is typically pre-trained on a large text corpus and undergoes a tuning stage to align the model with human preference or downstream tasks. In this work, we investigate the relationship between pre-training and fine-tuning by fine-tuning multiple intermediate pre-trained model checkpoints. Our results on 18 datasets suggest that i) continual pre-training improves the model in a latent way that unveils after fine-tuning; ii) with extra fine-tuning, the datasets that the model does not demonstrate capability gain much more than those that the model performs well during the pre-training stage; iii) although model benefits significantly through supervised fine-tuning, it may forget previously known domain knowledge and the tasks that are not seen during fine-tuning; iv) the model resembles high sensitivity to evaluation prompts after supervised fine-tuning, but this sensitivity can be alleviated by more pre-training.
arXiv.org Artificial Intelligence
Aug-14-2024
- Country:
- North America
- United States
- Maryland > Baltimore (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- Czechia > Prague (0.04)
- Sweden > Vaestra Goetaland
- Gothenburg (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Singapore (0.04)
- China > Hong Kong (0.04)
- Middle East
- Jordan (0.04)
- Israel (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- North America
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Education (0.46)
- Technology: