Review for NeurIPS paper: Multimodal Graph Networks for Compositional Generalization in Visual Question Answering

Jan-22-2025, 11:33:40 GMT–Neural Information Processing Systems

Additional Feedback: * Adding more details about graph isomorphism networks and sinkhorn normalization in the model section in page 4 will be useful. I'm wondering why not to use the standard CLEVR questions to measure that? I believe that as long as the newly introduced data doesn't provide or allow testing new aspects or tasks, it's better to use common data for better comparability to prior approaches. In addition, the standard CLEVR questions allow further variability in answers and reasoning skills needed than true/false statements and is carefully constructed to mitigate shortcuts and biases and so may be a better benchmark to use for the task of compositional reasoning. If so, when are the new True/False generated statements that are discussed in the bottom part of page 5 are used?

compositional generalization, multimodal graph network, neurips paper, (3 more...)

Neural Information Processing Systems

Jan-22-2025, 11:33:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.40)