Supervising the Transfer of Reasoning Patterns in VQA

Neural Information Processing Systems 

We proceed in the lines of the proof of Theorem 5.1 in [ Given a set of i.i.d data samples For first introduce some notation. From the proof of Theorem 5.1 in [11], we also know that H This finishes the proof for the case p = 1 . Let us consider the case of p = 2 l +1. This finishes the proof for the case p = 2 l +1. We provide more details on the program decoder architecture.