Mixed-effects transformers for hierarchical adaptation

White, Julia, Goodman, Noah, Hawkins, Robert

arXiv.org Artificial Intelligence 

Language differs dramatically from context to context. To some degree, large language models like GPT-3 account for such variation by conditioning on strings of initial input text, or prompts. However, prompting can be ineffective when contexts are sparse, out-of-sample, or extra-textual. In this paper, we introduce the mixed-effects transformer (MET), a novel approach for learning hierarchically-structured prefixes-- lightweight modules prepended to an input sequence-- to account for structured variation in language use. Specifically, we show how the popular class of mixedeffects regression models may be extended to transformer-based architectures using a regularized prefix-tuning procedure with dropout. Figure 1: In the mixed-effects transformer (MET), parameters We evaluate this approach on several domainadaptation of a pretrained transformer are frozen (solid benchmarks, finding that it learns border) while prefixes are adapted to different contextual contextual variation from minimal data while features (dashed border).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found