Multi-VQG: Generating Engaging Questions for Multiple Images
Yeh, Min-Hsuan, Chen, Vicent, Haung, Ting-Hao 'Kenneth', Ku, Lun-Wei
–arXiv.org Artificial Intelligence
Generating engaging content has drawn much recent attention in the NLP community. Asking questions is a natural way to respond to photos and promote awareness. However, most answers to questions in traditional question-answering (QA) datasets are factoids, which reduce individuals' willingness to answer. Furthermore, traditional visual question generation (VQG) confines the source data for question generation to single images, resulting in a limited ability to comprehend time-series information of the underlying event. In this paper, we propose generating engaging questions from multiple images. We present MVQG, a new dataset, and establish a series of baselines, including both end-to-end and dual-stage architectures. Results show that building stories behind the image sequence enables models to generate engaging questions, which confirms our assumption that people typically construct a picture of the event in their minds before asking questions. These results open up an exciting challenge for visual-and-language models to implicitly construct a story behind a series of photos to allow for creativity and experience sharing and hence draw attention to downstream applications.
arXiv.org Artificial Intelligence
Nov-17-2022
- Country:
- Asia > Taiwan (0.04)
- North America
- Dominican Republic (0.04)
- United States
- Pennsylvania (0.04)
- Michigan (0.04)
- Texas > Travis County
- Austin (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Massachusetts > Hampshire County
- Amherst (0.04)
- Illinois > Champaign County
- Urbana (0.04)
- Europe
- Genre:
- Research Report > New Finding (0.66)
- Technology: