AlignSum: Data Pyramid Hierarchical Fine-tuning for Aligning with Human Summarization Preference

Han, Yang, Wang, Yiming, Wang, Rui, Chen, Lu, Yu, Kai

Oct-1-2024–arXiv.org Artificial Intelligence

Text summarization tasks commonly employ Pre-trained Language Models (PLMs) to fit diverse standard datasets. While these PLMs excel in automatic evaluations, they frequently underperform in human evaluations, indicating a deviation between their generated summaries and human summarization preferences. This discrepancy is likely due to the low quality of fine-tuning datasets and the limited availability of high-quality human-annotated data that reflect true human preference. To address this challenge, we introduce a novel human summarization preference alignment framework AlignSum. This framework consists of three parts: Firstly, we construct a Data Pymarid with extractive, abstractive, and human-annotated summary data. Secondly, we conduct the Gaussian Resampling to remove summaries with extreme lengths. Finally, we implement the two-stage hierarchical fine-tuning with Data Pymarid after Gaussian Resampling. We apply AlignSum to PLMs on the human-annotated CNN/DailyMail and BBC XSum datasets. Experiments show that with AlignSum, PLMs like BART-Large surpass 175B GPT-3 in both automatic and human evaluations. This demonstrates that AlignSum significantly enhances the alignment of language models with human summarization preferences.

cnn dailymail, evaluation, summarization, (15 more...)

arXiv.org Artificial Intelligence

Oct-1-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - New Jersey (0.04)
    - Texas > Harris County
      - Houston (0.04)
    - California > San Francisco County
      - San Francisco (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe > Ireland
  - Leinster > County Dublin > Dublin (0.04)
- Asia
  - Singapore (0.04)
  - Middle East
    - UAE
      - Fujairah Emirate > Fujairah (0.04)
      - Dubai Emirate > Dubai (0.04)
      - Abu Dhabi Emirate > Abu Dhabi (0.04)
    - Iran > Tehran Province
      - Tehran (0.04)
  - China > Shanghai
    - Shanghai (0.04)

Genre:
- Research Report (1.00)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Health & Medicine (0.68)
- Law > Criminal Law (0.68)
- Government > Regional Government
  - North America Government > United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.91)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found