lexical ambiguity
Speakers Fill Lexical Semantic Gaps with Context
Pimentel, Tiago, Maudslay, Rowan Hall, Blasi, Damián, Cotterell, Ryan
Lexical ambiguity is widespread in language, allowing for the reuse of economical word forms and therefore making language more efficient. If ambiguous words cannot be disambiguated from context, however, this gain in efficiency might make language less clear -- resulting in frequent miscommunication. For a language to be clear and efficiently encoded, we posit that the lexical ambiguity of a word type should correlate with how much information context provides about it, on average. To investigate whether this is the case, we operationalise the lexical ambiguity of a word as the entropy of meanings it can take, and provide two ways to estimate this -- one which requires human annotation (using WordNet), and one which does not (using BERT), making it readily applicable to a large number of languages. We validate these measures by showing that, on six high-resource languages, there are significant Pearson correlations between our BERT-based estimate of ambiguity and the number of synonyms a word has in WordNet (e.g. $\rho = 0.40$ in English). We then test our main hypothesis -- that a word's lexical ambiguity should negatively correlate with its contextual uncertainty -- and find significant correlations on all 18 typologically diverse languages we analyse. This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
Shades of meaning: Uncovering the geometry of ambiguous word representations through contextualised language models
Cevoli, Benedetta, Watkins, Chris, Gao, Yang, Rastle, Kathleen
Lexical ambiguity presents a profound and enduring challenge to the language sciences. Researchers for decades have grappled with the problem of how language users learn, represent and process words with more than one meaning. Our work offers new insight into psychological understanding of lexical ambiguity through a series of simulations that capitalise on recent advances in contextual language models. These models have no grounded understanding of the meanings of words at all; they simply learn to predict words based on the surrounding context provided by other words. Yet, our analyses show that their representations capture fine-grained meaningful distinctions between unambiguous, homonymous, and polysemous words that align with lexicographic classifications and psychological theorising. These findings provide quantitative support for modern psychological conceptualisations of lexical ambiguity and raise new challenges for understanding of the way that contextual information shapes the meanings of words across different timescales.
Imprecise Meanings as a Cause of Uncertainty in Medical Knowledge-Based Systems
There has been a considerable amount of work on uncertainty in knowledge-based systems. This work has generally been concerned with uncertainty arising from the strength of inferences and the weight of evidence. In this paper we discuss another type of uncertainty: that which is due to imprecision in the underlying primitives used to represent the knowledge of the system. In particular, a given word may denote many similar but not identical entities. Such words are said to be lexically imprecise. Lexical imprecision has caused widespread problems in many areas. Unless this phenomenon is recognized and appropriately handled, it can degrade the performance of knowledge-based systems. In particular, it can lead to difficulties with the user interface, and with the inferencing processes of these systems. Some techniques are suggested for coping with this phenomenon.