Human-AdversarialVisualQuestionAnswering (SupplementaryMaterial) ATrainingDetails
–Neural Information Processing Systems
Unimodal We train the models with a batch size of 64 for 88K updates with linear learning rate schedule starting from1e 5 with a warmup for 2000 updates. We used a linear learning rate schedulewith2000warmupsteps. AsseeninTable 4, the category-wise performance of BERT compared to multimodal is different as multimodal models perform betteronnumbers andotherscategorywhilelagging onyes/no category. Foreach question, we collect ten answers from ten different annotators using this interface. We provide explicit instructions following [20]and [57]toavoidambiguity and collect short relevantanswers.
Neural Information Processing Systems
Feb-10-2026, 13:45:26 GMT
- Country:
- North America > United States (0.05)