Reviews: Chain of Reasoning for Visual Question Answering
–Neural Information Processing Systems
Paper Summary: This paper presented a novel approach that performs chain of reasonings on the object level to generate answer for visual question answering. Object-level visual embeddings are first extracted through object detection networks as visual representation and sentence embedding of the question are extract question representation. Based on these, a sequential model that performs multi-steps of relational inference over (compound) object embeddings with the guidance of question is used to obtain the final representation for each sub-chain inference. A concatenation of these embeddings are then used to perform answer classification. Extensive experiments have been conducted on four public datasets and it achieves state-of-the-art performance on all of them.
Neural Information Processing Systems
Oct-7-2024, 07:51:46 GMT
- Technology: