Retrieval-Constrained Decoding Reveals Underestimated Parametric Knowledge in Language Models
Hamdani, Rajaa El, Haffoudhi, Samy, Holzenberger, Nils, Suchanek, Fabian, Bonald, Thomas, Malliaros, Fragkiskos D.
–arXiv.org Artificial Intelligence
Language models (LMs) encode substantial factual knowledge, but often produce answers judged as incorrect. We hypothesize that many of these answers are actually correct, but are expressed in alternative surface forms that are dismissed due to an overly strict evaluation, leading to an underestimation of models' parametric knowledge. We propose Retrieval-Constrained Decoding (RCD), a decoding strategy that restricts model outputs to unique surface forms. We introduce YAGO-QA, a dataset of 19,137 general knowledge questions. Evaluating open-source LMs from 135M to 70B parameters, we show that standard decoding undervalues their knowledge. For instance, Llama-3.1-70B scores only 32.3% F1 with vanilla decoding but 46.0% with RCD. Similarly, Llama-3.1-8B reaches 33.0% with RCD, outperforming the larger model under vanilla decoding. We publicly share the code and dataset at https://github.com/Rajjaa/disambiguated-LLM.
arXiv.org Artificial Intelligence
Sep-30-2025
- Country:
- Arctic Ocean (0.04)
- Asia
- Europe
- Austria (0.04)
- Greece (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Spain (0.04)
- Ukraine > Crimea (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Alaska > Northwest Arctic Borough
- Arctic (0.04)
- Connecticut (0.04)
- Indiana (0.04)
- Ohio (0.04)
- Tennessee (0.04)
- Washington > King County
- Seattle (0.04)
- Alaska > Northwest Arctic Borough
- Canada > Ontario
- Pacific Ocean (0.04)
- South America > Argentina (0.04)
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Aerospace & Defense (0.93)
- Leisure & Entertainment > Sports
- Soccer (0.69)
- Media (1.00)
- Technology: