Analysing Neural Language Models: Contextual Decomposition Reveals Default Reasoning in Number and Gender Assignment

Jumelet, Jaap, Zuidema, Willem, Hupkes, Dieuwke

arXiv.org Artificial Intelligence 

Analysing Neural Language Models: Contextual Decomposition Reveals Default Reasoning in Number and Gender Assignment Jaap Jumelet jumeletjaap@gmail.com ILLC, University of Amsterdam Abstract Extensive research has recently shown that recurrent neural language models are able to process a wide range of grammatical phenomena. How these models are able to perform these remarkable feats so well, however, is still an open question. To gain more insight into what information LSTMs base their decisions on, we propose a generalisation of Contextual Decomposition (GCD). In particular, this setup enables us to accurately distil which part of a prediction stems from semantic heuristics, which part truly emanates from syntactic cues and which part arise from the model biases themselves instead. We investigate this technique on tasks pertaining to syntactic agreement and coreference resolution and discover that the model strongly relies on a default reasoning effect to perform these tasks. 1 Introduction Modern language models that use deep learning architectures such as LSTMs, bi-LSTMs and Transformers, have shown enormous gains in performance in the last few years and are finding applications in novel domains, ranging from speech recognition and writing assistance to autonomous generation of fake news. Understanding how they reach their predictions has become a key question for NLP, not only for purely scientific, but also for practical and ethical reasons. From a linguistic perspective, a natural approach is to test the extent to which these models have learned classical linguistic constructs, such as inflectional morphology, constituency structure, agreement between verb and subject, filler-gap dependencies, negative polarity or reflexive anaphora. An influential paper using this approach was presented by Linzen et al. (2016), who investigated the performance of an LSTM-based language model on number agreement.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found