Can Gradient Descent Simulate Prompting?

Zhang, Eric, Choshen, Leshem, Andreas, Jacob

Jun-27-2025–arXiv.org Artificial Intelligence

There are two primary ways of incorporating new information into a language model (LM): changing its prompt or changing its parameters, e.g. via fine-tuning. Parameter updates incur no long-term storage cost for model changes. However, for many model updates, prompting is significantly more effective: prompted models can generalize robustly from single examples and draw logical inferences that do not occur under standard fine-tuning. Can models be modified so that fine-tuning does emulate prompting? This paper describes a method for meta-training LMs such that gradient updates emulate the effects of conditioning on new information. Our approach uses tools from gradient-based meta-learning but uses an LM's own prompted predictions as targets, eliminating the need for ground-truth labels. Subsequent gradient descent training recovers some (and occasionally all) of prompted model performance -- showing improvement on the ``reversal curse'' tasks, and answering questions about text passages after a single gradient update. These results suggest that, with appropriate initialization, gradient descent can be surprisingly expressive. Our results suggest new avenues for long-context modeling and offer insight into the generalization capabilities of gradient-based learning.

artificial intelligence, machine learning, semanticscholar, (17 more...)

arXiv.org Artificial Intelligence

Jun-27-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Massachusetts
    - Middlesex County > Cambridge (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Switzerland (0.04)
  - United Kingdom > Scotland (0.04)
  - Germany (0.04)
  - Austria (0.04)
- Asia > Middle East
  - Jordan (0.04)
  - Saudi Arabia > Asir Province
    - Abha (0.04)

Genre:
- Research Report > New Finding (0.74)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found