Enhancing BERT-Based Visual Question Answering through Keyword-Driven Sentence Selection

Napolitano, Davide, Vaiani, Lorenzo, Cagliero, Luca

Oct-13-2023–arXiv.org Artificial Intelligence

The Document-based Visual Question Answering competition addresses the automatic detection of parent-child relationships between elements in multi-page documents. The goal is to identify the document elements that answer a specific question posed in natural language. This paper describes the PoliTo's approach to addressing this task, in particular, our best solution explores a text-only approach, leveraging an ad hoc sampling strategy. Specifically, our approach leverages the Masked Language Modeling technique to fine-tune a BERT model, focusing on sentences containing sensitive keywords that also occur in the questions, such as references to tables or images. Thanks to the effectiveness of this approach, we are able to achieve high performance compared to baselines, demonstrating how our solution contributes positively to this task.

bert model, international conference, preprint arxiv, (12 more...)

arXiv.org Artificial Intelligence

Oct-13-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.04)
- Europe
  - Switzerland (0.04)
  - United Kingdom > England
    - West Midlands > Birmingham (0.06)
  - Italy > Piedmont
    - Turin Province > Turin (0.05)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.64)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found