RLPrompt: Optimizing discrete text prompts with reinforcement learning
Figure 1: Overview of RL Prompt for discrete prompt optimization. All language models (LMs) are frozen. We build our policy network by training a task-specific multi-layer perceptron (MLP) network inserted into a frozen pre-trained LM. The figure above illustrates 1) generation of a prompt (left), 2) example usages in a masked LM for classification (top right) and a left-to-right LM for generation (bottom right), and 3) update of the MLP using RL reward signals (red arrows). TL;DR: Prompting enables large language models (LLMs) to perform various NLP tasks without changing the model.
Mar-7-2023, 12:15:03 GMT
- Technology: