Auto-ARGUE: LLM-Based Report Generation Evaluation

Walden, William, Mason, Marc, Weller, Orion, Dietz, Laura, Conroy, John, Molino, Neil, Recknor, Hannah, Li, Bryan, Liu, Gabrielle Kaili-May, Hou, Yu, Lawrie, Dawn, Mayfield, James, Yang, Eugene

arXiv.org Artificial Intelligence 

Generation of long-form, citation-backed reports is a primary use case for retrieval augmented generation (RAG) systems. While open-source evaluation tools exist for various RAG tasks, ones tailored to report generation (RG) are lacking. Accordingly, we introduce Auto-ARGUE, a robust LLM-based implementation of the recently proposed ARGUE framework for RG evaluation.