Goto

Collaborating Authors

The Emergence of Compositional Languages for Numeric Concepts Through Iterated Learning in Neural Agents

arXiv.org Artificial Intelligence

Since first introduced, computer simulation has been an increasingly important tool in evolutionary linguistics. Recently, with the development of deep learning techniques, research in grounded language learning has also started to focus on facilitating the emergence of compositional languages without pre-defined elementary linguistic knowledge. In this work, we explore the emergence of compositional languages for numeric concepts in multi-agent communication systems. We demonstrate that compositional language for encoding numeric concepts can emerge through iterated learning in populations of deep neural network agents. However, language properties greatly depend on the input representations given to agents. We found that compositional languages only emerge if they require less iterations to be fully learnt than other non-degenerate languages for agents on a given input representation.


Co-attentional Transformers for Story-Based Video Understanding

arXiv.org Artificial Intelligence

Inspired by recent trends in vision and language learning, we explore applications of attention mechanisms for visio-lingual fusion within an application to story-based video understanding. Like other video-based QA tasks, video story understanding requires agents to grasp complex temporal dependencies. However, as it focuses on the narrative aspect of video it also requires understanding of the interactions between different characters, as well as their actions and their motivations. We propose a novel co-attentional transformer model to better capture long-term dependencies seen in visual stories such as dramas and measure its performance on the video question answering task. We evaluate our approach on the recently introduced DramaQA dataset which features character-centered video story understanding questions. Our model outperforms the baseline model by 8 percentage points overall, at least 4.95 and up to 12.8 percentage points on all difficulty levels and manages to beat the winner of the DramaQA challenge.


Disentangled Contrastive Learning for Learning Robust Textual Representations

arXiv.org Artificial Intelligence

Although the self-supervised pre-training of transformer models has resulted in the revolutionizing of natural language processing (NLP) applications and the achievement of state-of-the-art results with regard to various benchmarks, this process is still vulnerable to small and imperceptible permutations originating from legitimate inputs. Intuitively, the representations should be similar in the feature space with subtle input permutations, while large variations occur with different meanings. This motivates us to investigate the learning of robust textual representation in a contrastive manner. However, it is non-trivial to obtain opposing semantic instances for textual samples. In this study, we propose a disentangled contrastive learning method that separately optimizes the uniformity and alignment of representations without negative sampling. Specifically, we introduce the concept of momentum representation consistency to align features and leverage power normalization while conforming the uniformity. Our experimental results for the NLP benchmarks demonstrate that our approach can obtain better results compared with the baselines, as well as achieve promising improvements with invariance tests and adversarial attacks. The code is available in https://github.com/zjunlp/DCL.


Code-switched inspired losses for generic spoken dialog representations

arXiv.org Artificial Intelligence

While there has been a growing interest in pretraining for dialog A crucial step in conversational AI is the identification (Mehri et al., 2019; Zhang et al., 2019d), the focus of underlying information of the user's utterance has mainly been on English datasets. Thus, these (e.g communicative intent or dialog acts, and works can not be directly applied to our multilingual emotions). This requires modeling utterance-level setting. Additionally, available multilingual information (Mitkov, 2014; Williams et al., 2014), pretraining objectives (Lample and Conneau, 2019; to capture immediate nuances of the user utterance; Liu et al., 2020; Xue et al., 2020; Qi et al., 2021) and discourse-level features (Thornbury and Slade, face two main limitations when applied to dialog 2006), to capture patterns over long ranges of the modeling: (1) they are a generalization of monolingual conversation. An added difficulty to this modeling objectives that use flat input text, whereas problem is that most people in the world are bilingual hierarchy has been shown to be a powerful prior (Grosjean and Li, 2013): therefore, progress for dialog modeling. This is a reflection of a dialog on these systems is limited by their inability to process itself, for example, context plays an essential role more than one language (English being the in the labeling of dialog acts.


A Survey on Contextual Embeddings

arXiv.org Artificial Intelligence

Contextual embeddings, such as ELMo and BERT, move beyond global word representations like Word2Vec and achieve ground-breaking performance on a wide range of natural language processing tasks. Contextual embeddings assign each word a representation based on its context, thereby capturing uses of words across varied contexts and encoding knowledge that transfers across languages. In this survey, we review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.