CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation for Meeting Summarization

Gong, Ziwei, Ai, Lin, Deshpande, Harshsaiprasad, Johnson, Alexander, Phung, Emmy, Wu, Zehui, Emami, Ahmad, Hirschberg, Julia

arXiv.org Artificial Intelligence 

The rapid advancement of Large Language Models In this paper, we address this gap by developing (LLMs) has significantly influenced the field of automatic a new evaluation framework tailored specifically evaluation for text summarization. LLMs for meeting summarization.We propose offer the potential to streamline the evaluation process, CREAM (Comparison-based Reference-free Eloranked making it faster and more cost-effective compared Automatic evaluation for Meeting summarization), to traditional human evaluation (Liu et al., a novel system designed to fill the gaps in 2023; Wang et al., 2023). However, despite the specialized and customizable evaluation for meeting progress in automatic evaluation techniques, existing summaries as illustrated in Figure 1. Our research methods primarily target general-purpose summarization addresses the following key questions: tasks, which typically involve shorter, 1. Do current LLM-based automatic evaluators more straightforward text inputs, which may not work effectively for meeting summarization?