How Does Response Length Affect Long-Form Factuality
Zhao, James Xu, Liu, Jimmy Z. J., Hooi, Bryan, Ng, See-Kiong
–arXiv.org Artificial Intelligence
Large language models (LLMs) are widely used for long-form text generation. However, factual errors in the responses would undermine their reliability. Despite growing attention to LLM factuality, the effect of response length on factuality remains underexplored. In this work, we systematically investigate this relationship by first introducing an automatic and bi-level long-form factuality evaluation framework, which achieves high agreement with human annotations while being cost-effective. Using this framework, we conduct controlled experiments and find that longer responses exhibit lower factual precision, confirming the presence of length bias. To explain this phenomenon, we empirically examine three hypotheses: error propagation, long context, and facts exhaustion. Our results reveal that facts exhaustion, where the model gradually exhausts more reliable knowledge, is the primary cause of factual degradation, rather than the other two hypotheses.
arXiv.org Artificial Intelligence
May-30-2025
- Country:
- Africa
- Middle East > Egypt (0.04)
- South Africa > Gauteng
- Soweto (0.04)
- Antarctica > West Antarctica
- Antarctic Peninsula (0.04)
- Asia
- India
- Andhra Pradesh (0.05)
- Jammu and Kashmir (0.05)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- India
- Europe > Italy
- Sardinia (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Ohio
- Cuyahoga County
- Cleveland (0.04)
- East Cleveland (0.04)
- Hamilton County > Cincinnati (0.04)
- Cuyahoga County
- Virginia (0.04)
- Florida > Miami-Dade County
- Canada > Ontario
- South America
- Argentina > Pampas
- Buenos Aires F.D. > Buenos Aires (0.04)
- Brazil (0.04)
- Argentina > Pampas
- Africa
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Education (0.67)
- Health & Medicine (1.00)
- Law (1.00)
- Leisure & Entertainment (1.00)
- Media
- Technology: