Human-Adversarial Visual Question Answering (Supplementary Material) A Training Details
–Neural Information Processing Systems
We use a batch size of 64 for 236K updates using a multi-step learning rate scheduler with steps at 180K and 216K, learning rate ratio of 0.2 and a warmup for 54K updates. The training takes an average of 8 hours. The training takes an average of 17 hours. We set the batch size to 8, weight decay as 1 e 4 and train the model on 8 GPUs for 2 days. MLM loss using a batch size of 64 which takes an average of 13 hours.
Neural Information Processing Systems
Nov-15-2025, 09:38:15 GMT