Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering

Lerner, Paul, Ferret, Olivier, Guinaudeau, Camille

Jan-11-2023–arXiv.org Artificial Intelligence

We present a new pre-training method, Multimodal Inverse Cloze Task, for Knowledge-based Visual Question Answering about named Entities (KVQAE). KVQAE is a recently introduced task that consists in answering questions about named entities grounded in a visual context using a Knowledge Base. Therefore, the interaction between the modalities is paramount to retrieve information and must be captured with complex fusion models. As these models require a lot of training data, we design this pre-training task from existing work in textual Question Answering. It consists in considering a sentence as a pseudo-question and its context as a pseudo-relevant passage and is extended by considering images near texts in multimodal documents. Our method is applicable to different neural network architectures and leads to a 9% relative-MRR and 15% relative-F1 gain for retrieval and reading comprehension, respectively, over a no-pre-training baseline.

machine learning, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

Jan-11-2023

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Oceania > New Zealand
  - South Island > Marlborough District > Blenheim (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.04)
    - New York > New York County
      - New York City (0.05)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
- Europe
  - United Kingdom > England
    - Greater London > London (0.04)
  - Ukraine > Kyiv Oblast
    - Kyiv (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - France > Bourgogne-Franche-Comté
    - Doubs > Besançon (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report > Experimental Study (0.46)

Industry:
- Government > Regional Government (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Representation & Reasoning
    - Expert Systems (1.00)
    - Information Fusion (0.66)
  - Natural Language
    - Text Processing (1.00)
    - Question Answering (0.83)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found