Goto

Collaborating Authors

 communication game








Emergent Graphical Conventions in a Visual Communication Game

Neural Information Processing Systems

Due to its iconic nature ( i.e ., perceptual resemblance to or natural association with the referent), drawings serve as a powerful tool to communicate concepts transcending language barriers (Fay et al., 2014). In fact, we humans started to use drawings to convey messages dating back to 40,000-60,000 years ago (Hoffmann et al., 2018; Hawkins et al., 2019).


Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data

Dutta, Parag, Dukkipati, Ambedkar

arXiv.org Artificial Intelligence

Image captioning is an important problem in developing various AI systems, and these tasks require large volumes of annotated images to train the models. Since all existing labelled datasets are already used for training the large Vision Language Models (VLMs), it becomes challenging to improve the performance of the same. Considering this, it is essential to consider the unsupervised image captioning performance, which remains relatively under-explored. To that end, we propose LoGIC (Lewis Communication Game for Image Captioning), a Multi-agent Reinforcement Learning game. The proposed method consists of two agents, a 'speaker' and a 'listener', with the objective of learning a strategy for communicating in natural language. We train agents in the cooperative common-reward setting using the GRPO algorithm and show that improvement in image captioning performance emerges as a consequence of the agents learning to play the game. We show that using pre-trained VLMs as the 'speaker' and Large Language Model (LLM) for language understanding in the 'listener', we achieved a $46$ BLEU score after fine-tuning using LoGIC without additional labels, a $2$ units advantage in absolute metrics compared to the $44$ BLEU score of the vanilla VLM. Additionally, we replace the VLM from the 'speaker' with lightweight components: (i) a ViT for image perception and (ii) a GPT2 language generation, and train them from scratch using LoGIC, obtaining a $31$ BLEU score in the unsupervised setting, a $10$ points advantage over existing unsupervised image-captioning methods.


Language Evolution with Deep Learning

Rita, Mathieu, Michel, Paul, Chaabouni, Rahma, Pietquin, Olivier, Dupoux, Emmanuel, Strub, Florian

arXiv.org Artificial Intelligence

Social animals have been found to use some means of communication to coordinate in various contexts: foraging for food, avoiding predators, mating, etc. (Hauser, 1996). Among animals, however, humans seem to be unique in having developed a communication system, natural language, that transcends these basic needs and can represent an infinite variety of new situations (Hauser et al., 2002) to the extent that language itself becomes the basis for a new form of evolution: cultural evolution. Understanding the emergence of this unique human ability has always been a vexing scientific problem due to the lack of access to the communication systems of intermediate steps of hominid evolution (Harnad et al., 1976; Bickerton, 2007). In the absence of data, a tempting idea has been to reproduce experimentally the process of language emergence in either humans or computational models (Steels, 1997; Myers-Scotton, 2002; Kirby, 2002). Experimental paradigms with humans (Kirby et al., 2008; Raviv et al., 2019; Motamedi et al., 2019) have produced significant insights into language evolution. Still, their scope is limited due to the inability to replicate key aspects of language evolution, such as communication within and across large populations and the study of long evolutionary timescales. Computer modeling can help overcome these limitations and has played a prominent role in studying language evolution for a long time (Lieberman and Crelin, 1971).


Emergent Graphical Conventions in a Visual Communication Game

Qiu, Shuwen, Xie, Sirui, Fan, Lifeng, Gao, Tao, Joo, Jungseock, Zhu, Song-Chun, Zhu, Yixin

arXiv.org Artificial Intelligence

Humans communicate with graphical sketches apart from symbolic languages (Fay et al., 2014). Primarily focusing on the latter, recent studies of emergent communication (Lazaridou and Baroni, 2020) overlook the sketches; they do not account for the evolution process through which symbolic sign systems emerge in the trade-off between iconicity and symbolicity. In this work, we take the very first step to model and simulate this process via two neural agents playing a visual communication game; the sender communicates with the receiver by sketching on a canvas. We devise a novel reinforcement learning method such that agents are evolved jointly towards successful communication and abstract graphical conventions. To inspect the emerged conventions, we define three fundamental properties--iconicity, symbolicity, and semanticity--and design evaluation methods accordingly. Our experimental results under different controls are consistent with the observation in studies of human graphical conventions (Hawkins et al., 2019; Fay et al., 2010). Of note, we find that evolved sketches can preserve the continuum of semantics (Mikolov et al., 2013) under proper environmental pressures. More interestingly, co-evolved agents can switch between conventionalized and iconic communication based on their familiarity with referents. We hope the present research can pave the path for studying emergent communication with the modality of sketches.