Review for NeurIPS paper: Multimodal Graph Networks for Compositional Generalization in Visual Question Answering
–Neural Information Processing Systems
Additional Feedback: * Adding more details about graph isomorphism networks and sinkhorn normalization in the model section in page 4 will be useful. I'm wondering why not to use the standard CLEVR questions to measure that? I believe that as long as the newly introduced data doesn't provide or allow testing new aspects or tasks, it's better to use common data for better comparability to prior approaches. In addition, the standard CLEVR questions allow further variability in answers and reasoning skills needed than true/false statements and is carefully constructed to mitigate shortcuts and biases and so may be a better benchmark to use for the task of compositional reasoning. If so, when are the new True/False generated statements that are discussed in the bottom part of page 5 are used?
Neural Information Processing Systems
Jan-22-2025, 11:33:40 GMT
- Technology: