Semantics, not syntax, creates NLU - Pat Inc - Medium


A scientific hypothesis starts the process of scientific enquiry. False hypotheses can start the path to disaster, as was seen with the geocentric model of the'universe' in which heavenly bodies moved in circular orbits. It became heresy to suggest that orbits aren't circular around the stationary earth, leading to epicycles. It's a good story worth studying in school to appreciate how a hypothesis is critical to validating science. Here's an important hypothesis: "The fundamental aim in the linguistic analysis of a language L is to separate the grammatical sequences which are the sentences of L from the ungrammatical sequences which are not sentences of L and to study the structure of the grammatical sequences."

Anti-efficient encoding in emergent communication Artificial Intelligence

Despite renewed interest in emergent language simulations with neural networks, little is known about the basic properties of the induced code, and how they compare to human language. One fundamental characteristic of the latter, known as Zipf's Law of Abbreviation (ZLA), is that more frequent words are efficiently associated to shorter strings. We study whether the same pattern emerges when two neural networks, a "speaker" and a "listener", are trained to play a signaling game. Surprisingly, we find that networks develop an \emph{anti-efficient} encoding scheme, in which the most frequent inputs are associated to the longest messages, and messages in general are skewed towards the maximum length threshold. This anti-efficient code appears easier to discriminate for the listener, and, unlike in human communication, the speaker does not impose a contrasting least-effort pressure towards brevity. Indeed, when the cost function includes a penalty for longer messages, the resulting message distribution starts respecting ZLA. Our analysis stresses the importance of studying the basic features of emergent communication in a highly controlled setup, to ensure the latter will not strand too far from human language. Moreover, we present a concrete illustration of how different functional pressures can lead to successful communication codes that lack basic properties of human language, thus highlighting the role such pressures play in the latter.

A Neural Network Architecture for Learning Word-Referent Associations in Multiple Contexts Machine Learning

This article proposes a biologically inspired neurocomputational architecture which learns associations between words and referents in different contexts, considering evidence collected from the literature of Psycholinguistics and Neurolinguistics. The multi-layered architecture takes as input raw images of objects (referents) and streams of word's phonemes (labels), builds an adequate representation, recognizes the current context, and associates label with referents incrementally, by employing a Self-Organizing Map which creates new association nodes (prototypes) as required, adjusts the existing prototypes to better represent the input stimuli and removes prototypes that become obsolete/unused. The model takes into account the current context to retrieve the correct meaning of words with multiple meanings. Simulations show that the model can reach up to 78% of word-referent association accuracy in ambiguous situations and approximates well the learning rates of humans as reported by three different authors in five Cross-Situational Word Learning experiments, also displaying similar learning patterns in the different learning conditions.

Synergies in learning words and their referents

Neural Information Processing Systems

This paper presents Bayesian non-parametric models that simultaneously learn to segment words from phoneme strings and learn the referents of some of those words, and shows that there is a synergistic interaction in the acquisition of these two kinds of linguistic information. The models themselves are novel kinds of Adaptor Grammars that are an extension of an embedding of topic models into PCFGs. These models simultaneously segment phoneme sequences into words and learn the relationship between non-linguistic objects to the words that refer to them. We show (i) that modelling inter-word dependencies not only improves the accuracy of the word segmentation but also of word-object relationships, and (ii) that a model that simultaneously learns word-object relationships and word segmentation segments more accurately than one that just learns word segmentation on its own. We argue that these results support an interactive view of language acquisition that can take advantage of synergies such as these.

Computational and Robotic Models of Early Language Development: A Review Artificial Intelligence

Abstract: We review computational and robotics models of early language learning and development. We first explain why and how these models are used to understand better how children learn language. We argue that they provide concrete theories of language learning as a complex dynamic system, complementing traditional methods in psychology and linguistics. We review different modeling formalisms, grounded in techniques from machine learning and artificial intelligence such as Bayesian and neural network approaches. We then discuss their role in understanding several key mechanisms of language development: cross-situational statistical learning, embodiment, situated social interaction, intrinsically motivated learning, and cultural evolution. We conclude by discussing future challenges for research, including modeling of large-scale empirical data about language acquisition in real-world environments. Language involves a multitude of components interacting in complex ways in parallel ...

On the Winograd Schema Challenge: Levels of Language Understanding and the Phenomenon of the Missing Text Artificial Intelligence

The Winograd Schema (WS) challenge has been proposed as an alternative to the Turing Test as a test for machine intelligence. In this short paper we "situate" the WS challenge in the data-information-knowledge continuum, suggesting in the process what a good WS is. Furthermore, we suggest that the WS is a special case of a more general phenomenon in language understanding, namely the phenomenon of the "missing text". In particular, we will argue that what we usually call thinking in the process of language understanding almost always involves discovering the missing text - text is rarely explicitly stated but is implicitly assumed as shared background knowledge. We therefore suggest extending the WS challenge to include tests beyond those involving reference resolution, including examples that require discovering the missing text in situations that are usually treated in computational linguistics under different labels, such as metonymy, quantifier scope ambiguity, lexical disambiguation, and co-predication, to name a few.

The Computational Metaphor and Artificial Intelligence: A Reflective Examination of a Theoretical Falsework

AI Magazine

AI. Specifically, we address three Just how little can be illustrated by the reaction to Winograd and Flores's (1986) recent book Understanding Computers and Cognition. In personal comments, the book and its authors have been savaged. Published comments are, of course, more temperate (Vellino et al. 1987) but still reveal the hypersensitivity of the Penrose's (1989) even more recent book The Emperor's New Mind have been observed. Like Suchman (1987) and Clancey (1987), we feel that insights of significant value are to be gained from an objective consideration of traditional and alternative perspectives. Some efforts in this direction are evident (Haugeland [1985], Hill [1989], and Born [1987], for example), but the issue requires additional and ongoing attention.

Modeling Semantic Expectation: Using Script Knowledge for Referent Prediction Machine Learning

Recent research in psycholinguistics has provided increasing evidence that humans predict upcoming content. Prediction also affects perception and might be a key to robustness in human language processing. In this paper, we investigate the factors that affect human prediction by building a computational model that can predict upcoming discourse referents based on linguistic knowledge alone vs. linguistic knowledge jointly with common-sense knowledge in the form of scripts. We find that script knowledge significantly improves model estimates of human predictions. In a second study, we test the highly controversial hypothesis that predictability influences referring expression type but do not find evidence for such an effect.

Evaluating the Pairwise Event Salience Hypothesis in Indexter

AAAI Conferences

Indexter is a plan-based computational model of narrative discourse which leverages cognitive scientific theories of how events are stored in memory during online comprehension. These discourse models are valuable for static and interactive narrative generation systems because they allow the author to reason about the audience's understanding and attention as they experience a story. A pair of Indexter events can share up to five indices: protagonist , time , space , causality , and intentionality . We present the first in a planned series of evaluations that will explore increasingly nuanced methods of using these indices to predict salience. The Pairwise Event Salience Hypothesis states that when a past event shares one or more indices with the most recently narrated event, that past event is more salient than one which shares no indices with the most recently narrated event. A crowd-sourced (n=200) study of 24 short text stories that control for content, text, and length supports this hypothesis. While this is encouraging, we believe it also motivates the development of a richer model that accounts for intervening events, narrative complexity, and episodic memory decay.

Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles

AAAI Conferences

Toponym resolution, or grounding names of places to their actual locations, is an important problem in analysis of both historical corpora and present-day news and web content. Recent approaches have shifted from rule-based spatial minimization methods to machine learned classifiers that use features of the text surrounding a toponym. Such methods have been shown to be highly effective, but they crucially rely on gazetteers and are unable to handle unknown place names or locations. We address this limitation by modeling the geographic distributions of words over the earth's surface: we calculate the geographic profile of each word based on local spatial statistics over a set of geo-referenced language models. These geo-profiles can be further refined by combining in-domain data with background statistics from Wikipedia. Our resolver computes the overlap of all geo-profiles in a given text span; without using a gazetteer, it performs on par with existing classifiers. When combined with a gazetteer, it achieves state-of-the-art performance for two standard toponym resolution corpora (TR-CoNLL and Civil War). Furthermore, it dramatically improves recall when toponyms are identified by named entity recognizers, which often (correctly) find non-standard variants of toponyms.