FELM: Benchmarking Factuality Evaluation of Large Language Models Shiqi Chen
–Neural Information Processing Systems
Assessing factuality of text generated by large language models (LLMs) is an emerging yet crucial research area, aimed at alerting users to potential errors and guiding the development of more reliable LLMs. Nonetheless, the evaluators assessing factuality necessitate suitable evaluation themselves to gauge progress and foster advancements. This direction remains under-explored, resulting in substantial impediments to the progress of factuality evaluators. To mitigate this issue, we introduce a benchmark for Factuality Evaluation of large Language Models, referred to as FELM. In this benchmark, we collect responses generated from LLMs and annotate factuality labels in a fine-grained manner. Contrary to previous studies that primarily concentrate on the factuality of world knowledge (e.g.
Neural Information Processing Systems
Feb-11-2025, 03:25:43 GMT
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Education (0.67)
- Energy > Power Industry
- Government (0.46)
- Information Technology (0.46)
- Technology: