Goto

Collaborating Authors

 disentangling reasoning


Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Neural Information Processing Systems

We marry two powerful ideas: deep representation learning for visual recognition and language understanding, and symbolic program execution for reasoning. Our neural-symbolic visual question answering (NS-VQA) system first recovers a structural scene representation from the image and a program trace from the question. It then executes the program on the scene representation to obtain an answer. Incorporating symbolic structure as prior knowledge offers three unique advantages. First, executing programs on a symbolic space is more robust to long program traces; our model can solve complex reasoning tasks better, achieving an accuracy of 99.8% on the CLEVR dataset. Second, the model is more data-and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering. Third, symbolic program execution offers full transparency to the reasoning process; we are thus able to interpret and diagnose each execution step.


Reviews: Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Neural Information Processing Systems

This paper uses neural networks to parse visual scenes and language queries, transforming them into a logical representation that can be used to compute the output of the query on the scene. The logical representation is learned via a combination of direct supervision via a small number of traces and fine-tuning using end-to-end reinforcement learning. Advantages of the approach over existing approaches include: Reduction in the number of training examples, a more interpretable inference process and substantially increased accuracy. The overall approach shows great promise in increasing the performance of neural architectures by incorporating a symbolic component, as well as making them more robust, interpretable and debuggable. So I think this is a good direction for AI research to go in.


Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Neural Information Processing Systems

We marry two powerful ideas: deep representation learning for visual recognition and language understanding, and symbolic program execution for reasoning. Our neural-symbolic visual question answering (NS-VQA) system first recovers a structural scene representation from the image and a program trace from the question. It then executes the program on the scene representation to obtain an answer. Incorporating symbolic structure as prior knowledge offers three unique advantages. First, executing programs on a symbolic space is more robust to long program traces; our model can solve complex reasoning tasks better, achieving an accuracy of 99.8% on the CLEVR dataset.