Gradient Ascent Post-training Enhances Language Model Generalization
Yoon, Dongkeun, Jang, Joel, Kim, Sungdong, Seo, Minjoon
–arXiv.org Artificial Intelligence
In this work, we empirically show that updating pretrained LMs (350M, 1.3B, 2.7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks. Specifically, we show that GAP can allow LMs to become comparable to 2-3x times larger LMs across 12 different NLP tasks. We also show that applying GAP on out-of-distribution corpora leads to the most reliable performance improvements. Our findings indicate that GAP can be a promising method for improving the generalization capability of LMs without any task-specific fine-tuning.
arXiv.org Artificial Intelligence
Jun-12-2023
- Country:
- Europe (0.68)
- North America > United States
- Minnesota > Hennepin County > Minneapolis (0.14)
- Genre:
- Research Report > New Finding (0.88)
- Technology: