AITopics | multimodal graph network

Collaborating Authors

multimodal graph network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multimodal Graph Networks for Compositional Generalization in Visual Question Answering

Neural Information Processing SystemsDec-23-2025, 20:22:46 GMT

compositional generalization, multimodal graph network, name change, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Multimodal Graph Networks for Compositional Generalization in Visual Question Answering

Neural Information Processing SystemsOct-2-2025, 10:12:57 GMT

Work done at Princeton as a Fulbright Scholar.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.28)

Genre: Research Report (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Review for NeurIPS paper: Multimodal Graph Networks for Compositional Generalization in Visual Question Answering

Neural Information Processing SystemsJan-22-2025, 11:33:40 GMT

Additional Feedback: * Adding more details about graph isomorphism networks and sinkhorn normalization in the model section in page 4 will be useful. I'm wondering why not to use the standard CLEVR questions to measure that? I believe that as long as the newly introduced data doesn't provide or allow testing new aspects or tasks, it's better to use common data for better comparability to prior approaches. In addition, the standard CLEVR questions allow further variability in answers and reasoning skills needed than true/false statements and is carefully constructed to mitigate shortcuts and biases and so may be a better benchmark to use for the task of compositional reasoning. If so, when are the new True/False generated statements that are discussed in the bottom part of page 5 are used?

compositional generalization, multimodal graph network, neurips paper, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.40)

Add feedback

Review for NeurIPS paper: Multimodal Graph Networks for Compositional Generalization in Visual Question Answering

Neural Information Processing SystemsJan-22-2025, 11:33:33 GMT

After the author response and discussion all reviewers recommend (weak) accept of this paper for its contributions including: - Significant improvements on the synthetic CLEVR/CLOSURE task - Overall novel and interesting method I accept the paper with the expectation that the author will improve and clarify the paper according the author response and suggestions by the reviewers, including discussion of related work. The main concern of the reviewers and I is that the paper limits their experimental evaluation to the synthetic CLEVR dataset. The authors are strongly encouraged to include results on a non-synthetic dataset (e.g. VQA-CP, NVLR/2, GQA - or subsets if necessary) in the final version, even if results in a negative result which could be analyzed by the authors.

compositional generalization, multimodal graph network, neurips paper, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.40)

Add feedback

Multimodal Graph Networks for Compositional Generalization in Visual Question Answering

Neural Information Processing SystemsOct-9-2024, 17:42:22 GMT

Compositional generalization is a key challenge in grounding natural language to visual perception. While deep learning models have achieved great success in multimodal tasks like visual question answering, recent studies have shown that they fail to generalize to new inputs that are simply an unseen combination of those seen in the training distribution. In this paper, we propose to tackle this challenge by employing neural factor graphs to induce a tighter coupling between concepts in different modalities (e.g. Graph representations are inherently compositional in nature and allow us to capture entities, attributes and relations in a scalable manner. Our model first creates a multimodal graph, processes it with a graph neural network to induce a factor correspondence matrix, and then outputs a symbolic program to predict answers to questions.

compositional generalization, induce, multimodal graph network

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback