Multi-Dimensional Evaluation of Text Summarization with In-Context Learning

Jain, Sameer, Keshava, Vaishakh, Sathyendra, Swarnashree Mysore, Fernandes, Patrick, Liu, Pengfei, Neubig, Graham, Zhou, Chunting

Jun-1-2023–arXiv.org Artificial Intelligence

Evaluation of natural language generation (NLG) is complex and multi-dimensional. Generated text can be evaluated for fluency, coherence, factuality, or any other dimensions of interest. Most frameworks that perform such multi-dimensional evaluation require training on large manually or synthetically generated datasets. In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning, obviating the need for large training datasets. Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization, establishing state-of-the-art on dimensions such as relevance and factual consistency. We then analyze the effects of factors such as the selection and number of in-context examples on performance. Finally, we study the efficacy of in-context learning based evaluators in evaluating zero-shot summaries written by large language models such as GPT-3.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jun-1-2023

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
- Europe > Spain
  - Catalonia > Barcelona Province > Barcelona (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found