Rule Extrapolation in Language Modeling: A Study of Compositional Generalization on OOD Prompts

May-26-2025, 22:09:08 GMT–Neural Information Processing Systems

LLMs show remarkable emergent abilities, such as inferring concepts from presumably out-of-distribution prompts, known as in-context learning. Though this success is often attributed to the Transformer architecture, our systematic understanding is limited. In complex real-world data sets, even defining what is out-of-distribution is not obvious. To better understand the OOD behaviour of autoregressive LLMs, we focus on formal languages, which are defined by the intersection of rules. We define a new scenario of OOD compositional generalization, termed \textit{rule extrapolation}.

large language model, natural language, rule extrapolation, (7 more...)

Neural Information Processing Systems

May-26-2025, 22:09:08 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)