Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries

Mar-5-2025–arXiv.org Artificial Intelligence

To answer one-to-many factual queries (e.g., listing cities of a country), a language model (LM) must simultaneously recall knowledge and avoid repeating previous answers. How are these two subtasks implemented and integrated internally? Across multiple datasets and models, we identify a promote-then-suppress mechanism: the model first recalls all answers, and then suppresses previously generated ones. Specifically, LMs use both the subject and previous answer tokens to perform knowledge recall, with attention propagating subject information and MLPs promoting the answers. Then, attention attends to and suppresses previous Figure 1: To answer one-to-many factual queries, we answer tokens, while MLPs amplify the found that LMs first use attention to propagate subject suppression signal. Our mechanism is corroborated information to the last token, which is used by MLPs by extensive experimental evidence: in to promote all possible answers. Attention then attends addition to using early decoding and causal to and suppresses the subject and previous answer tokens, tracing, we analyze how components use different while MLPs amplify the suppression and further tokens by introducing both Token Lens, promote new answers.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Mar-5-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.46)
- North America > United States
  - California (0.28)

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.47)
  - Natural Language > Large Language Model (0.71)