Rule Extrapolation in Language Modeling: A Study of Compositional Generalization on OOD Prompts
–Neural Information Processing Systems
LLMs show remarkable emergent abilities, such as inferring concepts from presumably out-of-distribution prompts, known as in-context learning. Though this success is often attributed to the Transformer architecture, our systematic understanding is limited. In complex real-world data sets, even defining what is out-of-distribution is not obvious. To better understand the OOD behaviour of autoregressive LLMs, we focus on formal languages, which are defined by the intersection of rules. We define a new scenario of OOD compositional generalization, termed \textit{rule extrapolation}.
Neural Information Processing Systems
May-26-2025, 22:09:08 GMT
- Technology: