OASum: Large-Scale Open Domain Aspect-based Summarization
Yang, Xianjun, Song, Kaiqiang, Cho, Sangwoo, Wang, Xiaoyang, Pan, Xiaoman, Petzold, Linda, Yu, Dong
–arXiv.org Artificial Intelligence
Aspect or query-based summarization has recently caught more attention, as it can generate differentiated summaries based on users' interests. However, the current dataset for aspect or query-based summarization either focuses on specific domains, contains relatively small-scale instances, or includes only a few aspect types. Such limitations hinder further explorations in this direction. In this work, we take advantage of crowd-sourcing knowledge on Wikipedia.org and automatically create a high-quality, large-scale open-domain aspect-based summarization dataset named OASum, which contains more than 3.7 million instances with around 1 million different aspects on 2 million Wikipedia pages. We provide benchmark results on OASum and demonstrate its ability for diverse aspect-based summarization generation. To overcome the data scarcity problem on specific domains, we also perform zero-shot, few-shot, and fine-tuning on seven downstream datasets. Specifically, zero/few-shot and fine-tuning results show that the model pre-trained on our corpus demonstrates a strong aspect or query-focused generation ability compared with the backbone model. Our dataset and pre-trained checkpoints are publicly available.
arXiv.org Artificial Intelligence
May-25-2023
- Country:
- Africa > Middle East
- Egypt (0.28)
- Asia > China (0.69)
- Europe (1.00)
- North America > United States
- Minnesota (0.28)
- Washington > King County (0.28)
- Africa > Middle East
- Genre:
- Research Report > New Finding (0.87)
- Industry:
- Government > Military (1.00)
- Leisure & Entertainment (1.00)
- Media (1.00)
- Transportation (1.00)
- Technology: