openfactcheck
OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs
Iqbal, Hasan, Wang, Yuxia, Wang, Minghan, Georgiev, Georgi, Geng, Jiahui, Gurevych, Iryna, Nakov, Preslav
The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures, which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (https://github.com/hasaniqbal777/openfactcheck) and publicly released as a Python library (https://pypi.org/project/openfactcheck/) and also as a web service (https://huggingface.co/spaces/hasaniqbal777/OpenFactCheck). A video describing the system is available at https://youtu.be/-i9VKL0HleI.
- North America > United States > New York (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Austria (0.04)
OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs
Wang, Yuxia, Wang, Minghan, Iqbal, Hasan, Georgiev, Georgi, Geng, Jiahui, Nakov, Preslav
The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. Difficulties lie in assessing the factuality of free-form responses in open domains. Also, different papers use disparate evaluation benchmarks and measurements, which renders them hard to compare and hampers future progress. To mitigate these issues, we propose OpenFactCheck, a unified factuality evaluation framework for LLMs. OpenFactCheck consists of three modules: (i) CUSTCHECKER allows users to easily customize an automatic fact-checker and verify the factual correctness of documents and claims, (ii) LLMEVAL, a unified evaluation framework assesses LLM's factuality ability from various perspectives fairly, and (iii) CHECKEREVAL is an extensible solution for gauging the reliability of automatic fact-checkers' verification results using human-annotated datasets. OpenFactCheck is publicly released at https://github.com/yuxiaw/OpenFactCheck.
- North America > United States > New York (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (4 more...)