Gradient Ascent Post-training Enhances Language Model Generalization

Yoon, Dongkeun, Jang, Joel, Kim, Sungdong, Seo, Minjoon

Jun-12-2023–arXiv.org Artificial Intelligence

In this work, we empirically show that updating pretrained LMs (350M, 1.3B, 2.7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks. Specifically, we show that GAP can allow LMs to become comparable to 2-3x times larger LMs across 12 different NLP tasks. We also show that applying GAP on out-of-distribution corpora leads to the most reliable performance improvements. Our findings indicate that GAP can be a promising method for improving the generalization capability of LMs without any task-specific fine-tuning.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Jun-12-2023

arXiv.org PDF

Add feedback

Country:
- Europe (0.68)
- North America > United States
  - Minnesota > Hennepin County > Minneapolis (0.14)

Genre:
- Research Report > New Finding (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language
    - Chatbot (0.46)
    - Large Language Model (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found