NumHG: A Dataset for Number-Focused Headline Generation
Huang, Jian-Tao, Chen, Chung-Chi, Huang, Hen-Hsen, Chen, Hsin-Hsi
–arXiv.org Artificial Intelligence
Headline generation, a key task in abstractive summarization, strives to condense a full-length article into a succinct, single line of text. Notably, while contemporary encoder-decoder models excel based on the ROUGE metric, they often falter when it comes to the precise generation of numerals in headlines. We identify the lack of datasets providing fine-grained annotations for accurate numeral generation as a major roadblock. To address this, we introduce a new dataset, the NumHG, and provide over 27,000 annotated numeral-rich news articles for detailed investigation. Further, we evaluate five well-performing models from previous headline generation tasks using human evaluation in terms of numerical accuracy, reasonableness, and readability. Our study reveals a need for improvement in numerical accuracy, demonstrating the potential of the NumHG dataset to drive progress in number-focused headline generation and stimulate further discussions in numeral-focused text generation.
arXiv.org Artificial Intelligence
Sep-4-2023
- Country:
- Asia
- China > Hong Kong (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Taiwan > Taiwan Province
- Taipei (0.05)
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Ireland > Leinster
- North America
- Mexico (0.14)
- United States > California
- San Diego County > San Diego (0.04)
- Oceania > Australia (0.04)
- Asia
- Genre:
- Research Report (1.00)
- Technology: