On Learning to Summarize with Large Language Models as References

Liu, Yixin, Shi, Kejian, He, Katherine S, Ye, Longtian, Fabbri, Alexander R., Liu, Pengfei, Radev, Dragomir, Cohan, Arman

Nov-16-2023–arXiv.org Artificial Intelligence

Recent studies have found that summaries generated by large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets. Therefore, we investigate a new learning setting of text summarization models that considers the LLMs as the reference or the gold-standard oracle on these datasets. To examine the standard practices that are aligned with this new learning setting, we investigate two LLM-based summary quality evaluation methods for model training and adopt a contrastive learning training method to leverage the LLM-guided learning signals. Our experiments on the CNN/DailyMail and XSum datasets demonstrate that smaller summarization models can achieve similar performance as LLMs under LLM-based evaluation. However, we found that the smaller models can not yet reach LLM-level performance under human evaluation despite promising improvements brought by our proposed training methods. Meanwhile, we perform a meta-analysis on this new learning setting that reveals a discrepancy between human and LLM-based evaluation, highlighting the benefits and risks of this LLM-as-reference setting we investigated.

brio, chatgpt, evaluation, (15 more...)

arXiv.org Artificial Intelligence

Nov-16-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - Canada (0.04)
  - United States
    - New York (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
  - Puerto Rico > San Juan
    - San Juan (0.04)
- Europe
  - Germany > Berlin (0.04)
  - France (0.04)
  - United Kingdom
    - Wales (0.04)
    - England (0.04)
    - Scotland (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
  - Italy
    - Tuscany > Florence (0.04)
    - Liguria > Genoa (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Atlantic Ocean > North Atlantic Ocean
  - English Channel (0.04)
- Asia
  - Russia (0.28)
  - Middle East > Jordan (0.04)
  - China
    - Yunnan Province (0.04)
    - Shanghai > Shanghai (0.04)
    - Hong Kong (0.04)

Genre:
- Research Report (1.00)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
- Law (0.93)
- Leisure & Entertainment > Sports (0.68)
- Government
  - Military (0.67)
  - Regional Government > Europe Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)