Revisiting Automatic Question Summarization Evaluation in the Biomedical Domain

Yuan, Hongyi, Zhang, Yaoyun, Huang, Fei, Huang, Songfang

Mar-18-2023–arXiv.org Artificial Intelligence

Automatic evaluation metrics have been facilitating the rapid development of automatic summarization methods by providing instant and fair assessments of the quality of summaries. Most metrics have been developed for the general domain, especially news and meeting notes, or other language-generation tasks. However, these metrics are applied to evaluate summarization systems in different domains, such as biomedical question summarization. To better understand whether commonly used evaluation metrics are capable of evaluating automatic summarization in the biomedical domain, we conduct human evaluations of summarization quality from four different aspects of a biomedical question summarization task. Based on human judgments, we identify different noteworthy features for current automatic metrics and summarization systems as well. We also release a dataset of our human annotations to aid the research of summarization evaluation metrics in the biomedical domain.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Mar-18-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania (0.04)
  - Washington > King County
    - Seattle (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
- Europe
  - Czechia > Prague (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language
    - Large Language Model (0.46)
    - Generation (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found