WebQA: Multihop and Multimodal QA

Chang, Yingshan, Narang, Mridu, Suzuki, Hisami, Cao, Guihong, Gao, Jianfeng, Bisk, Yonatan

Sep-21-2021–arXiv.org Artificial Intelligence

Web search is fundamentally multimodal and multihop. Often, even before asking a question we choose to go directly to image search to find our answers. Further, rarely do we find an answer from a single source but aggregate information and reason through implications. Despite the frequency of this everyday occurrence, at present, there is no unified question answering benchmark that requires a single model to answer long-form natural language questions from text and open-ended visual sources -- akin to a human's experience. We propose to bridge this gap between the natural language and computer vision communities with WebQA. We show that A. our multihop text queries are difficult for a large-scale transformer model, and B. existing multi-modal transformers and visual representations do not perform well on open-domain visual queries. Our challenge for the community is to create a unified multimodal reasoning model that seamlessly transitions and reasons regardless of the source modality.

modality, proceedings, reasoning, (13 more...)

arXiv.org Artificial Intelligence

Sep-21-2021

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada (0.04)
  - United States
    - Michigan (0.04)
    - New York > New York County
      - New York City (0.04)
    - California > Los Angeles County
      - Long Beach (0.04)
- Europe
  - Austria (0.04)
  - Czechia > Prague (0.04)
  - Spain > Catalonia (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia > Japan
  - Honshū
    - Tōhoku > Miyagi Prefecture
      - Sendai (0.04)
    - Kantō > Tokyo Metropolis Prefecture
      - Tokyo (0.04)
- Africa > Middle East
  - Egypt (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology
  - Information Management > Search (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language > Question Answering (0.70)
    - Machine Learning > Pattern Recognition (0.49)
    - Cognitive Science > Problem Solving (0.46)