Mixed-effects transformers for hierarchical adaptation

White, Julia, Goodman, Noah, Hawkins, Robert

Dec-8-2022–arXiv.org Artificial Intelligence

Language differs dramatically from context to context. To some degree, large language models like GPT-3 account for such variation by conditioning on strings of initial input text, or prompts. However, prompting can be ineffective when contexts are sparse, out-of-sample, or extra-textual. In this paper, we introduce the mixed-effects transformer (MET), a novel approach for learning hierarchically-structured prefixes-- lightweight modules prepended to an input sequence-- to account for structured variation in language use. Specifically, we show how the popular class of mixedeffects regression models may be extended to transformer-based architectures using a regularized prefix-tuning procedure with dropout. Figure 1: In the mixed-effects transformer (MET), parameters We evaluate this approach on several domainadaptation of a pretrained transformer are frozen (solid benchmarks, finding that it learns border) while prefixes are adapted to different contextual contextual variation from minimal data while features (dashed border).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Dec-8-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Washington > King County
    - Seattle (0.04)
  - California > Santa Clara County
    - Palo Alto (0.04)
- Europe
  - Italy (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (1.00)

Industry:
- Leisure & Entertainment (0.68)
- Media > Film (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found