Goto

Collaborating Authors

 a-bot




Enhancing Visual Dialog Questioner with Entity-based Strategy Learning and Augmented Guesser

Zheng, Duo, Xu, Zipeng, Meng, Fandong, Wang, Xiaojie, Wang, Jiaan, Zhou, Jie

arXiv.org Artificial Intelligence

Considering the importance of building a good Visual Dialog (VD) Questioner, many researchers study the topic under a Q-Bot-A-Bot image-guessing game setting, where the Questioner needs to raise a series of questions to collect information of an undisclosed image. Despite progress has been made in Supervised Learning (SL) and Reinforcement Learning (RL), issues still exist. Firstly, previous methods do not provide explicit and effective guidance for Questioner to generate visually related and informative questions. Secondly, the effect of RL is hampered by an incompetent component, i.e., the Guesser, who makes image predictions based on the generated dialogs and assigns rewards accordingly. To enhance VD Questioner: 1) we propose a Related entity enhanced Questioner (ReeQ) that generates questions under the guidance of related entities and learns entity-based questioning strategy from human dialogs; 2) we propose an Augmented Guesser (AugG) that is strong and is optimized for the VD setting especially. Experimental results on the VisDial v1.0 dataset show that our approach achieves state-of-theart performance on both image-guessing task and question diversity. Human study further proves that our model generates more visually related, informative and coherent questions.


Definitions of intent suitable for algorithms

Ashton, Hal

arXiv.org Artificial Intelligence

Intent modifies an actor's culpability of many types wrongdoing. Autonomous Algorithmic Agents have the capability of causing harm, and whilst their current lack of legal personhood precludes them from committing crimes, it is useful for a number of parties to understand under what type of intentional mode an algorithm might transgress. From the perspective of the creator or owner they would like ensure that their algorithms never intend to cause harm by doing things that would otherwise be labelled criminal if committed by a legal person. Prosecutors might have an interest in understanding whether the actions of an algorithm were internally intended according to a transparent definition of the concept. The presence or absence of intention in the algorithmic agent might inform the court as to the complicity of its owner. This article introduces definitions for direct, oblique (or indirect) and ulterior intent which can be used to test for intent in an algorithmic actor.


Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Cogswell, Michael, Lu, Jiasen, Jain, Rishabh, Lee, Stefan, Parikh, Devi, Batra, Dhruv

arXiv.org Artificial Intelligence

Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people? Such agents could leverage a larger variety of existing data to generalize to new tasks, minimizing expensive data collection and annotation. In this work, we study a setting we call "Dialog without Dialog", which requires agents to develop visually grounded dialog models that can adapt to new tasks without language level supervision. By factorizing intention and language, our model minimizes linguistic drift after fine-tuning for new tasks. We present qualitative results, automated metrics, and human studies that all show our model can adapt to new tasks and maintain language quality. Baselines either fail to perform well at new tasks or experience language drift, becoming unintelligible to humans. Code has been made available at https://github.com/mcogswell/dialog_without_dialog


On Emergent Communication in Competitive Multi-Agent Teams

Liang, Paul Pu, Chen, Jeffrey, Salakhutdinov, Ruslan, Morency, Louis-Philippe, Kottur, Satwik

arXiv.org Artificial Intelligence

Several recent works have found the emergence of grounded compositional language in the communication protocols developed by mostly cooperative multi-agent systems when learned end-to-end to maximize performance on a downstream task. However, human populations learn to solve complex tasks involving communicative behaviors not only in fully cooperative settings but also in scenarios where competition acts as an additional external pressure for improvement. In this work, we investigate whether competition for performance from an external, similar agent team could act as a social influence that encourages multi-agent populations to develop better communication protocols for improved performance, compositionality, and convergence speed. We start from Task & Talk, a previously proposed referential game between two cooperative agents as our testbed and extend it into Task, Talk & Compete, a game involving two competitive teams each consisting of two aforementioned cooperative agents. Using this new setting, we provide an empirical study demonstrating the impact of competitive influence on multi-agent teams. Our results show that an external competitive influence leads to improved accuracy and generalization, as well as faster emergence of communicative languages that are more informative and compositional.


Improving Generative Visual Dialog by Answering Diverse Questions

Murahari, Vishvak, Chattopadhyay, Prithvijit, Batra, Dhruv, Parikh, Devi, Das, Abhishek

arXiv.org Artificial Intelligence

Prior work on training generative Visual Dialog models with reinforcement learning(Das et al.) has explored a Qbot-Abot image-guessing game and shown that this 'self-talk' approach can lead to improved performance at the downstream dialog-conditioned image-guessing task. However, this improvement saturates and starts degrading after a few rounds of interaction, and does not lead to a better Visual Dialog model. We find that this is due in part to repeated interactions between Qbot and Abot during self-talk, which are not informative with respect to the image. To improve this, we devise a simple auxiliary objective that incentivizes Qbot to ask diverse questions, thus reducing repetitions and in turn enabling Abot to explore a larger state space during RL ie. be exposed to more visual concepts to talk about, and varied questions to answer. We evaluate our approach via a host of automatic metrics and human studies, and demonstrate that it leads to better dialog, ie. dialog that is more diverse (ie. less repetitive), consistent (ie. has fewer conflicting exchanges), fluent (ie. more human-like),and detailed, while still being comparably image-relevant as prior work and ablations.


Emergence of Compositional Language with Deep Generational Transmission

Cogswell, Michael, Lu, Jiasen, Lee, Stefan, Parikh, Devi, Batra, Dhruv

arXiv.org Artificial Intelligence

Consider a collaborative task that requires communication. Two agents are placed in an environment and must create a language from scratch in order to coordinate. Recent work has been interested in what kinds of languages emerge when deep reinforcement learning agents are put in such a situation, and in particular in the factors that cause language to be compositional-i.e. meaning is expressed by combining words which themselves have meaning. Evolutionary linguists have also studied the emergence of compositional language for decades, and they find that in addition to structural priors like those already studied in deep learning, the dynamics of transmitting language from generation to generation contribute significantly to the emergence of compositionality. In this paper, we introduce these cultural evolutionary dynamics into language emergence by periodically replacing agents in a population to create a knowledge gap, implicitly inducing cultural transmission of language. We show that this implicit cultural transmission encourages the resulting languages to exhibit better compositional generalization and suggest how elements of cultural dynamics can be further integrated into populations of deep agents.


Community Regularization of Visually-Grounded Dialog

Agarwal, Akshat, Gurumurthy, Swaminathan, Sharma, Vasu, Lewis, Mike, Sycara, Katia

arXiv.org Artificial Intelligence

The task of conducting visually grounded dialog involves learning goal-oriented cooperative dialog between autonomous agents who exchange information about a scene through several rounds of questions and answers in natural language. We posit that requiring artificial agents to adhere to the rules of human language, while also requiring them to maximize information exchange through dialog is an ill-posed problem. We observe that humans do not stray from a common language because they are social creatures who live in communities, and have to communicate with many people everyday, so it is far easier to stick to a common language even at the cost of some efficiency loss. Using this as inspiration, we propose and evaluate a multi-agent community-based dialog framework where each agent interacts with, and learns from, multiple agents, and show that this community-enforced regularization results in more relevant and coherent dialog (as judged by human evaluators) without sacrificing task performance (as judged by quantitative metrics).