np np
Modelling Child Learning and Parsing of Long-range Syntactic Dependencies
Mahon, Louis, Johnson, Mark, Steedman, Mark
This work develops a probabilistic child language acquisition model to learn a range of linguistic phenonmena, most notably long-range syntactic dependencies of the sort found in object wh-questions, among other constructions. The model is trained on a corpus of real child-directed speech, where each utterance is paired with a logical form as a meaning representation. It then learns both word meanings and language-specific syntax simultaneously. After training, the model can deduce the correct parse tree and word meanings for a given utterance-meaning pair, and can infer the meaning if given only the utterance. The successful modelling of long-range dependencies is theoretically important because it exploits aspects of the model that are, in general, trans-context-free.
A Language-agnostic Model of Child Language Acquisition
Mahon, Louis, Abend, Omri, Berger, Uri, Demuth, Katherine, Johnson, Mark, Steedman, Mark
This work reimplements a recent semantic bootstrapping child-language acquisition model, which was originally designed for English, and trains it to learn a new language: Hebrew. The model learns from pairs of utterances and logical forms as meaning representations, and acquires both syntax and word meanings simultaneously. The results show that the model mostly transfers to Hebrew, but that a number of factors, including the richer morphology in Hebrew, makes the learning slower and less robust. This suggests that a clear direction for future work is to enable the model to leverage the similarities between different word forms.
DisCoCat for Donkey Sentences
McPheat, Lachlan, Wang, Daphne
Montague semantics is a compositional method to translate the semantics of written language into first order logic. As a simple example one can understand the meaning of the sentence "(all) dogs eat snacks" as x, y.dogs(x) snacks(y) eats(x, y). However, when translating the meaning of the sentence "Every farmer who owns a donkey beats it", the variable representing the donkey cannot be bound by the existential quantifier coming from the determiner'a'. This issue was studied by Geach [4], using it as a counterexample to the scope of Montague semantics. Many have created systems that form semantic representations of donkey sentences, to name a few we have dynamic predicate logic [7], where the binding rules of quantifiers in first order logic are relaxed, discourse representation theory [11] where an collection of'discourse referents' keep track of individuals' mentions and are identified to keep track of references, as well as an approach using dependent type theory [18], exploiting dependent sums to differentiate between ambiguous readings of donkey sentences. However, none of the models mentioned above are type-logical grammars which poses the question whether it is possible to parse donkey sentences and form usable representations of them using type logical grammars? We propose to model donkey sentences using (an extension of) Lambek calculus, L. In the following section, we explain how a type-logical analysis of natural language works, and in sections 1.3,1.4,1.5 how to extend it to model more exotic linguistic phenomena, culminating in a parse of a donkey sentence. Then we introduce relational semantics and vector space semantics of the extended Lambek calculus in sections 3.1 and 3.3 respectively, demonstrating how donkey sentence is interpreted as a relation or as a linear map.
Categorical Vector Space Semantics for Lambek Calculus with a Relevant Modality
McPheat, Lachlan, Sadrzadeh, Mehrnoosh, Wazni, Hadi, Wijnholds, Gijs
We develop a categorical compositional distributional semantics for Lambek Calculus with a Relevant Modality !L*, which has a limited edition of the contraction and permutation rules. The categorical part of the semantics is a monoidal biclosed category with a coalgebra modality, very similar to the structure of a Differential Category. We instantiate this category to finite dimensional vector spaces and linear maps via "quantisation" functors and work with three concrete interpretations of the coalgebra modality. We apply the model to construct categorical and concrete semantic interpretations for the motivating example of !L*: the derivation of a phrase with a parasitic gap. The effectiveness of the concrete interpretations are evaluated via a disambiguation task, on an extension of a sentence disambiguation dataset to parasitic gap phrases, using BERT, Word2Vec, and FastText vectors and Relational tensors.
Classical Copying versus Quantum Entanglement in Natural Language: The Case of VP-ellipsis
Wijnholds, Gijs, Sadrzadeh, Mehrnoosh
This paper compares classical copying and quantum entanglement in natural language by considering the case of verb phrase (VP) ellipsis. VP ellipsis is a non-linear linguistic phenomenon that requires the reuse of resources, making it the ideal test case for a comparative study of different copying behaviours in compositional models of natural language. Following the line of research in compositional distributional semantics set out by (Coecke et al., 2010) we develop an extension of the Lambek calculus which admits a controlled form of contraction to deal with the copying of linguistic resources. We then develop two different compositional models of distributional meaning for this calculus. In the first model, we follow the categorical approach of (Coecke et al., 2013) in which a functorial passage sends the proofs of the grammar to linear maps on vector spaces and we use Frobenius algebras to allow for copying. In the second case, we follow the more traditional approach that one finds in categorial grammars, whereby an intermediate step interprets proofs as non-linear lambda terms, using multiple variable occurrences that model classical copying. As a case study, we apply the models to derive different readings of ambiguous elliptical phrases and compare the analyses that each model provides.