4c26774d852f62440fc746ea4cdd57f6-Supplemental.pdf
–Neural Information Processing Systems
Weusetwodifferent MLP heads ontopofthetransformed value oftheCLS tokentoextract the final answer, one head for descriptive questions and one head for multiple choice questions. We also used a weight decay of 0.01. Weapplyour model to the original CLEVR dataset [21], for which we have ground-truth segmentation masks. Moreover, even for the questions where the removed object is causally connected to the other objects, about 45% can be answered perfectly byanalgorithm answering thequestion asifitwere adescriptivequestion. To quantify this,wewrote asymbolic executor thatusestheprovided ground-truth video annotations and parsed questions to determine causal connectivity and whether each choice happened in the non-counterfactualscenario.
Neural Information Processing Systems
Feb-8-2026, 13:36:57 GMT
- Technology:
- Information Technology (0.48)