Reviews: Chain of Reasoning for Visual Question Answering

Oct-7-2024, 07:51:46 GMT–Neural Information Processing Systems

Paper Summary: This paper presented a novel approach that performs chain of reasonings on the object level to generate answer for visual question answering. Object-level visual embeddings are first extracted through object detection networks as visual representation and sentence embedding of the question are extract question representation. Based on these, a sequential model that performs multi-steps of relational inference over (compound) object embeddings with the guidance of question is used to obtain the final representation for each sub-chain inference. A concatenation of these embeddings are then used to perform answer classification. Extensive experiments have been conducted on four public datasets and it achieves state-of-the-art performance on all of them.

clevr, reasoning, supplementary material, (7 more...)

Neural Information Processing Systems

Oct-7-2024, 07:51:46 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.63)