Chain of Reasoning for Visual Question Answering

Wu, Chenfei, Liu, Jinlai, Wang, Xiaojie, Dong, Xuan

Neural Information Processing Systems 

Reasoning plays an essential role in Visual Question Answering (VQA). Multi-step and dynamic reasoning is often necessary for answering complex questions. For example, a question "What is placed next to the bus on the right of the picture?" talks about a compound object "bus on the right," which is generated by the relation bus, on the right of, picture . Furthermore, a new relation including this compound object sign, next to, bus on the right is then required to infer the answer. However, previous methods support either one-step or static reasoning, without updating relations or generating compound objects.