4c26774d852f62440fc746ea4cdd57f6-Supplemental.pdf

Feb-8-2026, 13:36:57 GMT–Neural Information Processing Systems

Weusetwodifferent MLP heads ontopofthetransformed value oftheCLS tokentoextract the final answer, one head for descriptive questions and one head for multiple choice questions. We also used a weight decay of 0.01. Weapplyour model to the original CLEVR dataset [21], for which we have ground-truth segmentation masks. Moreover, even for the questions where the removed object is causally connected to the other objects, about 45% can be answered perfectly byanalgorithm answering thequestion asifitwere adescriptivequestion. To quantify this,wewrote asymbolic executor thatusestheprovided ground-truth video annotations and parsed questions to determine causal connectivity and whether each choice happened in the non-counterfactualscenario.

collide, collide 2, metal cube, (15 more...)

Neural Information Processing Systems

Feb-8-2026, 13:36:57 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology (0.48)

Duplicate Docs Excel Report

Title
details

Similar Docs Excel Report more

Title	Similarity	Source
None found