Goto

Collaborating Authors

 Hao, Sophie


Generative Linguistics, Large Language Models, and the Social Nature of Scientific Success

arXiv.org Artificial Intelligence

Chomsky (1968: 3) greeted the rise of computing technology with skepticism, arguing that "the kinds of structures that are realizable in terms of [computational methods ] are simply not those that must be postulated to underlie the use of language . " 55 years later, Piantadosi (2023: 15) celebrated the release of ChatGPT by directing that same criticism toward generative linguistic s: "the success of large language models is a failure for generative theories because it goes against virtually all of the principles these theories have espoused . " Chesi ( forthcoming) may not agree with Piantadosi's criticisms, but he does take them as a harbinger of scientific crisis. The minimalist program, hampered by a lack of formal and empirical rigor, has failed to produce a comprehensive, self - consistent theory of syntax. ChatG PT's apparent linguistic competence, in tandem with the success of computational accounts of gradient acceptability and online phenomena, seem to suggest that "generative linguistics no longer dictates the agenda for future linguistic challenges" ( Chesi forthcoming: 2). In order to survive, Chesi warns, generativists need to make progress towards a theory that is based on precisely stated principles and evaluated on a common set of explananda . Chesi's target paper presents the current collision of the worlds as a debate about the intellectual merits of generativist theories. According to Chesi, the success of generativism depends on generativists' ability to resolve their deficits of rigor, so that they can parry the theoretical attacks that language model s have levied against core principles of minimalism. This response argues, contrary to Chesi's framing but consistent with current consensus in the history and sociology of science (Fleck 1935; Kuhn 1962; Mullin s 1975; Latour 1984; Law & Lodge 1984), that the generativist crisis described by Piantadosi and Chesi is social in nature, and cannot be averted by intellectual means.


What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length

arXiv.org Artificial Intelligence

When comparing the linguistic capabilities of language models (LMs) with humans using LM probabilities, factors such as the length of the sequence and the unigram frequency of lexical items have a significant effect on LM probabilities in ways that humans are largely robust to. Prior works in comparing LM and human acceptability judgments treat these effects uniformly across models, making a strong assumption that models require the same degree of adjustment to control for length and unigram frequency effects. We propose MORCELA, a new linking theory between LM scores and acceptability judgments where the optimal level of adjustment for these effects is estimated from data via learned parameters for length and unigram frequency. We first show that MORCELA outperforms a commonly used linking theory for acceptability--SLOR (Pauls and Klein, 2012; Lau et al. 2017)--across two families of transformer LMs (Pythia and OPT). Furthermore, we demonstrate that the assumed degrees of adjustment in SLOR for length and unigram frequency overcorrect for these confounds, and that larger models require a lower relative degree of adjustment for unigram frequency, though a significant amount of adjustment is still necessary for all models. Finally, our subsequent analysis shows that larger LMs' lower susceptibility to frequency effects can be explained by an ability to better predict rarer words in context.


ERAS: Evaluating the Robustness of Chinese NLP Models to Morphological Garden Path Errors

arXiv.org Artificial Intelligence

In languages without orthographic word boundaries, NLP models perform word segmentation, either as an explicit preprocessing step or as an implicit step in an end-to-end computation. This paper shows that Chinese NLP models are vulnerable to morphological garden path errors: errors caused by a failure to resolve local word segmentation ambiguities using sentence-level morphosyntactic context. We propose a benchmark, ERAS, that tests a model's vulnerability to morphological garden path errors by comparing its behavior on sentences with and without local segmentation ambiguities. Using ERAS, we show that word segmentation models make garden path errors on locally ambiguous sentences, but do not make equivalent errors on unambiguous sentences. We further show that sentiment analysis models with character-level tokenization make implicit garden path errors, even without an explicit word segmentation step in the pipeline. Our results indicate that models' segmentation of Chinese text often fails to account for morphosyntactic context.


Reflecting the Male Gaze: Quantifying Female Objectification in 19th and 20th Century Novels

arXiv.org Artificial Intelligence

Inspired by the concept of the male gaze (Mulvey, 1975) in literature and media studies, this paper proposes a framework for analyzing gender bias in terms of female objectification: the extent to which a text portrays female individuals as objects of visual pleasure. Our framework measures female objectification along two axes. First, we compute an agency bias score that indicates whether male entities are more likely to appear in the text as grammatical agents than female entities. Next, by analyzing the word embedding space induced by a text (Caliskan et al., 2017), we compute an appearance bias score that indicates whether female entities are more closely associated with appearance-related words than male entities. Applying our framework to 19th and 20th century novels reveals evidence of female objectification in literature: we find that novels written from a male perspective systematically objectify female characters, while novels written from a female perspective do not exhibit statistically significant objectification of any gender.