coreferent
Adapting Psycholinguistic Research for LLMs: Gender-inclusive Language in a Coreference Context
Bartl, Marion, Murphy, Thomas Brendan, Leavy, Susan
Gender-inclusive language is often used with the aim of ensuring that all individuals, regardless of gender, can be associated with certain concepts. While psycholinguistic studies have examined its effects in relation to human cognition, it remains unclear how Large Language Models (LLMs) process gender-inclusive language. Given that commercial LLMs are gaining an increasingly strong foothold in everyday applications, it is crucial to examine whether LLMs in fact interpret gender-inclusive language neutrally, because the language they generate has the potential to influence the language of their users. This study examines whether LLM-generated coreferent terms align with a given gender expression or reflect model biases. Adapting psycholinguistic methods from French to English and German, we find that in English, LLMs generally maintain the antecedent's gender but exhibit underlying masculine bias. In German, this bias is much stronger, overriding all tested gender-neutralization strategies.
$2 * n$ is better than $n^2$: Decomposing Event Coreference Resolution into Two Tractable Problems
Ahmed, Shafiuddin Rehan, Nath, Abhijnan, Martin, James H., Krishnaswamy, Nikhil
Event Coreference Resolution (ECR) is the task of linking mentions of the same event either within or across documents. Most mention pairs are not coreferent, yet many that are coreferent can be identified through simple techniques such as lemma matching of the event triggers or the sentences in which they appear. Existing methods for training coreference systems sample from a largely skewed distribution, making it difficult for the algorithm to learn coreference beyond surface matching. Additionally, these methods are intractable because of the quadratic operations needed. To address these challenges, we break the problem of ECR into two parts: a) a heuristic to efficiently filter out a large number of non-coreferent pairs, and b) a training approach on a balanced set of coreferent and non-coreferent mention pairs. By following this approach, we show that we get comparable results to the state of the art on two popular ECR datasets while significantly reducing compute requirements. We also analyze the mention pairs that are "hard" to accurately classify as coreferent or non-coreferent. Code at https://github.com/ahmeshaf/lemma_ce_coref
Do language models make human-like predictions about the coreferents of Italian anaphoric zero pronouns?
Michaelov, James A., Bergen, Benjamin K.
Some languages allow arguments to be omitted in certain contexts. Yet human language comprehenders reliably infer the intended referents of these zero pronouns, in part because they construct expectations about which referents are more likely. We ask whether Neural Language Models also extract the same expectations. We test whether 12 contemporary language models display expectations that reflect human behavior when exposed to sentences with zero pronouns from five behavioral experiments conducted in Italian by Carminati (2005). We find that three models - XGLM 2.9B, 4.5B, and 7.5B - capture the human behavior from all the experiments, with others successfully modeling some of the results. This result suggests that human expectations about coreference can be derived from exposure to language, and also indicates features of language models that allow them to better reflect human behavior.
Selection Collider Bias in Large Language Models
In this paper we motivate the causal mechanisms behind sample selection induced collider bias (selection collider bias) that can cause Large Language Models (LLMs) to learn unconditional dependence between entities that are unconditionally independent in the real world. We show that selection collider bias can become amplified in underspecified learning tasks, and although difficult to overcome, we describe a method to exploit the resulting spurious correlations for determination of when a model may be uncertain about its prediction. We demonstrate an uncertainty metric that matches human uncertainty in tasks with gender pronoun underspecification on an extended version of the Winogender Schemas evaluation set, and we provide an online demo where users can apply our uncertainty metric to their own texts and models.
Marmara Turkish Coreference Corpus and Coreference Resolution Baseline
Schรผller, Peter, Cฤฑngฤฑllฤฑ, Kรผbra, Tunรงer, Ferit, Sรผrmeli, Barฤฑล Gรผn, Pekel, Ayลegรผl, Karatay, Ayลe Hande, Karakaล, Hacer Ezgi
Coreference Resolution is the task of identifying groups of phrases in a text that refer to the same discourse entity. Such referring phrases are called mentions, a set of mentions that all refer to the same 1 discourse entity is called a coreference chain. Annotated corpora are important resources for developing and evaluating automatic coreference resolution methods. Turkish is an agglutinative language and Turkish coreference resolution poses several challenges different from many other languages, in particular the absence of grammatical gender, the possibility of null pronouns in subject and object position, possessive pronouns that can be expressed as suffixes, and ambiguities among possessive and number morphemes, e.g., 'รงocuklarฤฑ' can be analysed as'their children' or as'his/her children', depending on context Oflazer and Bozลahin (1994). No coreference resolution corpus exists for Turkish so far. We here describe the result of an effort to create such a corpus based on the METU-Sabanci Turkish Treebank (Say, Zeyrek, Oflazer, and รzge, 2004; Atalay, Oflazer, and Say, 2003; Oflazer, Say, Hakkani-Tรผr, and Tรผr, 2003) which is, to the best of our knowledge, the only publicly available Turkish Treebank. Our contributions are as follows.
Joint Inference over a Lightly Supervised Information Extraction Pipeline: Towards Event Coreference Resolution for Resource-Scarce Languages
Chen, Chen (University of Texas at Dallas) | Ng, Vincent (University of Texas at Dallas)
We address two key challenges in end-to-end event coreference resolution research: (1) the error propagation problem, where an event coreference resolver has to assume as input the noisy outputs produced by its upstream components in the standard information extraction (IE) pipeline; and (2) the data annotation bottleneck, where manually annotating data for all the components in the IE pipeline is prohibitively expensive. This is the case in the vast majority of the world's natural languages, where such annotated resources are not readily available. To address these problems, we propose to perform joint inference over a lightly supervised IE pipeline, where all the models are trained using either active learning or unsupervised learning. Using our approach, only 25% of the training sentences in the Chinese portion of the ACE 2005 corpus need to be annotated with entity and event mentions in order for our event coreference resolver to surpass its fully supervised counterpart in performance.
Modeling the Lifespan of Discourse Entities with Application to Coreference Resolution
de Marneffe, Marie-Catherine, Recasens, Marta, Potts, Christopher
A discourse typically involves numerous entities, but few are mentioned more than once. Distinguishing those that die out after just one mention (singleton) from those that lead longer lives (coreferent) would dramatically simplify the hypothesis space for coreference resolution models, leading to increased performance. To realize these gains, we build a classifier for predicting the singleton/coreferent distinction. The models feature representations synthesize linguistic insights about the factors affecting discourse entity lifespans (especially negation, modality, and attitude predication) with existing results about the benefits of surface (part-of-speech and n-gram-based) features for coreference resolution. The model is effective in its own right, and the feature representations help to identify the anchor phrases in bridging anaphora as well. Furthermore, incorporating the model into two very different state-of-the-art coreference resolution systems, one rule-based and the other learning-based, yields significant performance improvements.
Acquiring Domain Specific Knowledge and Coreference Cues for Coreference Resolution
Gilbert, Nathan (University of Utah)
Current Coreference Resolution systems utilize a broad range of general knowledge features to make resolutions in a general setting. These approaches ignore coreference knowledge found in domain specific collections and how coreferent entities interact in different domains. This research addresses these issues by developing knowledge bases of coreference characteristics drawn from annotated and unannotated domain texts and utilizing lexical and discourse information to improve resolution.
Narrowing the Modeling Gap: A Cluster-Ranking Approach to Coreference Resolution
Traditional learning-based coreference resolvers operate by training the mention-pair model for determining whether two mentions are coreferent or not. Though conceptually simple and easy to understand, the mention-pair model is linguistically rather unappealing and lags far behind the heuristic-based coreference models proposed in the pre-statistical NLP era in terms of sophistication. Two independent lines of recent research have attempted to improve the mention-pair model, one by acquiring the mention-ranking model to rank preceding mentions for a given anaphor, and the other by training the entity-mention model to determine whether a preceding cluster is coreferent with a given mention. We propose a cluster-ranking approach to coreference resolution, which combines the strengths of the mention-ranking model and the entity-mention model, and is therefore theoretically more appealing than both of these models. In addition, we seek to improve cluster rankers via two extensions: (1) lexicalization and (2) incorporating knowledge of anaphoricity by jointly modeling anaphoricity determination and coreference resolution. Experimental results on the ACE data sets demonstrate the superior performance of cluster rankers to competing approaches as well as the effectiveness of our two extensions.
A Machine Learning Approach to Linking FOAF Instances
Sleeman, Jennifer (University of Maryland, Baltimore County) | Finin, Tim (University of Maryland, Baltimore County)
The friend of a friend (FOAF) vocabulary is widely used on the Web to describe individual people and their properties. Since FOAF does not require a unique ID for a person, it is not clear when two FOAF agents should be linked as co-referent, i.e., denote the same person in the world. One approach is to use the presence of inverse functional properties (e.g., foaf:mbox) as evidence that two individuals are the same. Another applies heuristics based on the string similarity of values of FOAF properties such as name and school as evidence for or against co-reference. Performance is limited, however, by many factors: non-semantic string matching, noise, changes in the world, and the lack of more sophisticated graph analytics. We describe a supervised machine learning approach that uses features defined over pairs of FOAF individuals to produce a classifier for identifying co-referent FOAF instances. We present initial results using data collected from Swoogle and other sources and describe plans for additional analysis.