Abduction of Domain Relationships from Data for VQA

Chowdhury, Al Mehdi Saadat, Shakarian, Paulo, Simari, Gerardo I.

arXiv.org Artificial Intelligence 

Visual Question Answering (VQA) is an AI task designed to reason about images. Commonly, the image is transformed into a "scene graph" that enables the deployment of more formal reasoning tools. For example, in recent work, both the scene graph and associated query were represented as an ASP Program [2, 1]; however, notably the scene graph itself only contains information about the scene, but lacks commonsense knowledge - in particular, knowledge about the domains of attributes identified by the scene. Existing work to address this shortcoming relies on leveraging large commonsense knowledge graphs for obtaining domain knowledge [5, 6, 7]. However, such approaches require the ability to accurately align the language of the knowledge graph with the language of the scene graph. Further, for some applications, this does not guarantee that the aligned knowledge graph will necessarily improve VQA performance (e.g., if domain knowledge relevant to the queries is not possessed in the knowledge graph). In this paper, we provide an orthogonal and complementary approach that leverages logical representations of the scene graph and query to abduce domain relationships that can improve query answering performance. We frame the abduction problem and provide a simple algorithm that provides a valid solution. We also provide an implementation and show on a standard dataset that we can improve question answering accuracy from 59.98% to 81.01%, and provide comparable results with few historical examples.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found