OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization

Nov-11-2023–arXiv.org Artificial Intelligence

Opinion summarization sets itself apart from other types of summarization tasks due to its distinctive focus on aspects and sentiments. Although certain automated evaluation methods like ROUGE have gained popularity, we have found them to be unreliable measures for assessing the quality of opinion summaries. In this paper, we present OpinSummEval, a dataset comprising human judgments and outputs from 14 opinion summarization models. We further explore the correlation between 24 automatic metrics and human ratings across four dimensions. Our findings indicate that metrics based on neural networks generally outperform non-neural ones. However, even metrics built on powerful backbones, such as BART and GPT-3/3.5, do not consistently correlate well across all dimensions, highlighting the need for advancements in automated evaluation methods for opinion summarization. The code and data are publicly available at https://github.com/A-Chicharito-S/OpinSummEval/tree/main.

computational linguistic, dimension, summarization, (14 more...)

arXiv.org Artificial Intelligence

Nov-11-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Michigan (0.04)
    - Washington > King County
      - Seattle (0.04)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Massachusetts
      - Suffolk County > Boston (0.04)
      - Middlesex County > Cambridge (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - Ontario > Toronto (0.04)
- Europe
  - Germany > Berlin (0.04)
  - Sweden > Uppsala County
    - Uppsala (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - France > Hauts-de-France
    - Nord > Lille (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Middle East > Qatar
    - Ad-Dawhah > Doha (0.04)
  - China
    - Hong Kong (0.04)
    - Beijing > Beijing (0.04)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (0.96)
  - Machine Learning > Neural Networks
    - Deep Learning (0.31)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found