f0b76267fbe12b936bd65e203dc675c1-AuthorFeedback.pdf

Neural Information Processing Systems 

Note that the VQA results in Table 2 with continuous attention use fewer basis functions than discrete regions. Good idea, we will add this to the camera-ready version. Is this a necessary or a sufficient condition?" Sufficient; we will clarify and follow the suggestions (move the beta-escort definition to the main text and fix typos). We will add a citation. We chose ridge regression as it enables a closed-form solution expressed linearly in terms of the basis functions (Eq. We haven't tried linear interpolation, However, for a high-level vision system, combining our method with BUTD is an interesting idea. Text are naturally discrete tokens."