NewsQs: Multi-Source Question Generation for the Inquiring Mind
Hwang, Alyssa, Dixit, Kalpit, Ballesteros, Miguel, Benajiba, Yassine, Castelli, Vittorio, Dreyer, Markus, Bansal, Mohit, McKeown, Kathleen
–arXiv.org Artificial Intelligence
We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents. To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles from the News On the Web corpus. We show that fine-tuning a model with control codes produces questions that are judged acceptable more often than the same model without them as measured through human evaluation. We use a QNLI model with high correlation with human annotations to filter our data. We release our final dataset of high-quality questions, answers, and document clusters as a resource for future work in query-based multi-document summarization.
arXiv.org Artificial Intelligence
Jun-15-2024
- Country:
- Africa
- Asia
- Afghanistan (0.04)
- Middle East > Syria (0.04)
- Europe
- United Kingdom (0.14)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Norway (0.04)
- Finland (0.04)
- Switzerland (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Iceland (0.04)
- Italy > Tuscany
- Florence (0.04)
- North America > United States
- Alaska (0.04)
- California > Los Angeles County
- Los Angeles (0.04)
- Kentucky (0.04)
- Minnesota (0.04)
- North Carolina (0.04)
- Pennsylvania (0.04)
- Oceania > Australia
- Genre:
- Research Report
- Experimental Study (0.46)
- New Finding (0.46)
- Research Report
- Industry:
- Government > Voting & Elections (0.46)
- Technology: