AITopics | learning visual dialog agent

Collaborating Authors

learning visual dialog agent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Neural Information Processing SystemsDec-24-2025, 19:54:05 GMT

dialog data, learning visual dialog agent, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.45)

Add feedback

Review for NeurIPS paper: Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Neural Information Processing SystemsFeb-7-2025, 14:42:38 GMT

Weaknesses: The main problem with the paper is the game design. In visual dialogue, i.e GuessWhich game[2], does not have access to the image. It has to build up the visual representation based on the caption and dialogue. That is why having a caption is important for the GuessWhich game (L69). While in the proposed game, since Q-Bot has constant access to the images. It just needs to ask questions such that it distinguished the one image from the other.

dialogue, learning visual dialog agent, vqa data, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.40)

Add feedback

Review for NeurIPS paper: Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Neural Information Processing SystemsFeb-7-2025, 14:42:30 GMT

All reviewers agree that this submission is above the acceptance threshold and they are all agree that the idea of decoupling text generation from policy learning during RL is a compelling idea and interesting idea. I would also like to recommend acceptance with two notes: 1) the reviewers raised a number of questions which were addressed in the author response, most of which are already contained in the Supplementary material, so I would advice the authors to incorporate these points in the main manuscript 2) I see your method as a way to also deal with language drift more generally. There are a couple of recent papers looking into dealing with language drift. For example, Lee et al (2019) deal with language drift through image grounding while Lazaridou et al (2020) and Lu et al. (2020) also decouple generation and policy learning, the former through reranking of language modelling samples using the RL reward and the latter through distillation such that the RL signal is never disrupting the core language knowledge. Are any of these methods superior over the others?

arxiv preprint arxiv, language drift, learning visual dialog agent, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.40)

Add feedback

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Neural Information Processing SystemsOct-11-2024, 16:13:09 GMT

Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people? Such agents could leverage a larger variety of existing data to generalize to a new task, minimizing expensive data collection and annotation. In this work, we study a setting we call "Dialog without Dialog", which requires agents to develop visually grounded dialog models that can adapt to new tasks without language level supervision. We present qualitative results, automated metrics, and human studies that all show our model can adapt to new tasks and maintain language quality. Baselines either fail to perform well at new tasks or experience language drift, becoming unintelligible to humans.

dialog data, learning visual dialog agent, new task, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.66)

Add feedback