REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering (Supplementary Materials) A Overview

Neural Information Processing Systems 

In the supplementary materials, we provide the following sections: (a) Implementation details of implicit knowledge retrieval in Section B. (b) Ablation study experiments in Section C. (c) Visualization results in Section D. We first describe more implementation details of implicit knowledge retrieval of the proposed REVIVE . Specifically, we explain how we extract multiple answer candidates. PICa's multi-query ensemble approach, we take all these In our experiments, we just retrieve 5 ( i.e., When using only one implicit knowledge candidate, the model can achieve 55.8% accuracy, However, when the retrieved candidate number is 8, we can see that the performance isn't the best, Table 3: Ablation study on using different object detectors. R-CNN (R101) mean using ResNet-50 [2] and ResNet-101 [2] as backbones. We can see that Faster R-CNN with ResNet-50 and ResNet-101 as the backbone can achieve 55.3% and 55.6% accuracy respectively, and using the GLIP as the object detector can achieve the optimal Conference on Computer Vision and Pattern Recognition, pages 3195-3204, 2019. 2 Figure 1: The implicit knowledge retrieval visualization results without and with the proposed regional descriptions/tags.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found