Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Yan, Tianyi Lorena, Jia, Robin
–arXiv.org Artificial Intelligence
To answer one-to-many factual queries (e.g., listing cities of a country), a language model (LM) must simultaneously recall knowledge and avoid repeating previous answers. How are these two subtasks implemented and integrated internally? Across multiple datasets and models, we identify a promote-then-suppress mechanism: the model first recalls all answers, and then suppresses previously generated ones. Specifically, LMs use both the subject and previous answer tokens to perform knowledge recall, with attention propagating subject information and MLPs promoting the answers. Then, attention attends to and suppresses previous Figure 1: To answer one-to-many factual queries, we answer tokens, while MLPs amplify the found that LMs first use attention to propagate subject suppression signal. Our mechanism is corroborated information to the last token, which is used by MLPs by extensive experimental evidence: in to promote all possible answers. Attention then attends addition to using early decoding and causal to and suppresses the subject and previous answer tokens, tracing, we analyze how components use different while MLPs amplify the suppression and further tokens by introducing both Token Lens, promote new answers.
arXiv.org Artificial Intelligence
Mar-5-2025
- Country:
- Asia > China (0.46)
- North America > United States
- California (0.28)
- Genre:
- Research Report > New Finding (0.93)
- Technology: