Mixed-effects transformers for hierarchical adaptation
White, Julia, Goodman, Noah, Hawkins, Robert
–arXiv.org Artificial Intelligence
Language differs dramatically from context to context. To some degree, large language models like GPT-3 account for such variation by conditioning on strings of initial input text, or prompts. However, prompting can be ineffective when contexts are sparse, out-of-sample, or extra-textual. In this paper, we introduce the mixed-effects transformer (MET), a novel approach for learning hierarchically-structured prefixes-- lightweight modules prepended to an input sequence-- to account for structured variation in language use. Specifically, we show how the popular class of mixedeffects regression models may be extended to transformer-based architectures using a regularized prefix-tuning procedure with dropout. Figure 1: In the mixed-effects transformer (MET), parameters We evaluate this approach on several domainadaptation of a pretrained transformer are frozen (solid benchmarks, finding that it learns border) while prefixes are adapted to different contextual contextual variation from minimal data while features (dashed border).
arXiv.org Artificial Intelligence
Dec-8-2022
- Country:
- North America > United States
- Washington > King County
- Seattle (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- Washington > King County
- Europe
- Italy (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Industry:
- Leisure & Entertainment (0.68)
- Media > Film (0.47)
- Technology: