Goto

Collaborating Authors

 figure6




Figure6: Graph-Q-SATinferencetimelinearlydependsonthenumberofverticesinthegraph

Neural Information Processing Systems

Figure 5: Graph-Q-SAT's MRIR improvement (10 model calls) results in the wall clock time reduction. We call the middle part'the core'. The output of the core is concatenated with the output of the encoder andgetsfedtothecoreagain. We also plan to release the experimental code and the modified version of MiniSat to use as a gym environment. Encoder and Decoder are independent graph networks,i.e.


Appendix: LanguageModelswithImageDescriptors areStrongFew-ShotVideo-LanguageLearners

Neural Information Processing Systems

For VaTeX captioning and retrieval, we use the latest v1.1 version3, which contains 25,991 videos for training and 6,000 videos for public testing. The statistics can be found in Table 1. Visual genome synsets are pairs, where the keys are noisy natural language phrases and the values are the mapped WordNet synsets [6]. Ifavisualtokenoccurs in multiple frames, we use the averaged frame indexas its temporal indicator. Specifically,for UniVL, we set the number of epoches to be50 and the linear warmup steps to be40.