4c26774d852f62440fc746ea4cdd57f6-Supplemental.pdf

Neural Information Processing Systems 

Weusetwodifferent MLP heads ontopofthetransformed value oftheCLS tokentoextract the final answer, one head for descriptive questions and one head for multiple choice questions. We also used a weight decay of 0.01. Weapplyour model to the original CLEVR dataset [21], for which we have ground-truth segmentation masks. Moreover, even for the questions where the removed object is causally connected to the other objects, about 45% can be answered perfectly byanalgorithm answering thequestion asifitwere adescriptivequestion. To quantify this,wewrote asymbolic executor thatusestheprovided ground-truth video annotations and parsed questions to determine causal connectivity and whether each choice happened in the non-counterfactualscenario.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found