Hidden Question Representations Tell Non-Factuality Within and Across Large Language Models
Wang, Yanling, Li, Haoyang, Zou, Hao, Zhang, Jing, He, Xinlei, Li, Qi, Xu, Ke
–arXiv.org Artificial Intelligence
Despite the remarkable advance of large language models (LLMs), the prevalence of non-factual responses remains a common issue. This work studies non-factuality prediction (NFP), which predicts whether an LLM will generate non-factual responses to a question before the generation process. Previous efforts on NFP usually rely on extensive computation. In this work, we conduct extensive analysis to explore the capabilities of using a lightweight probe to elicit ``whether an LLM knows'' from the hidden representations of questions. Additionally, we discover that the non-factuality probe employs similar patterns for NFP across multiple LLMs. Motivated by the intriguing finding, we conduct effective transfer learning for cross-LLM NFP and propose a question-aligned strategy to ensure the efficacy of mini-batch based training.
arXiv.org Artificial Intelligence
Jun-7-2024
- Country:
- Asia > China (0.28)
- North America > United States (0.46)
- Genre:
- Research Report (0.40)
- Industry:
- Health & Medicine (0.68)
- Leisure & Entertainment > Sports (1.00)
- Media > Music (0.68)
- Technology: