AITopics | coreference chain

Collaborating Authors

coreference chain

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Elephant in the Coreference Room: Resolving Coreference in Full-Length French Fiction Works

Bourgois, Antoine, Poibeau, Thierry

arXiv.org Artificial IntelligenceOct-20-2025

While coreference resolution is attracting more interest than ever from computational literature researchers, representative datasets of fully annotated long documents remain surprisingly scarce. In this paper, we introduce a new annotated corpus of three full-length French novels, totaling over 285,000 tokens. Unlike previous datasets focused on shorter texts, our corpus addresses the challenges posed by long, complex literary works, enabling evaluation of coreference models in the context of long reference chains. We present a modular coreference resolution pipeline that allows for fine-grained error analysis. We show that our approach is competitive and scales effectively to long documents. Finally, we demonstrate its usefulness to infer the gender of fictional characters, showcasing its relevance for both literary analysis and downstream NLP tasks.

computational linguistic, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.15594

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Maryland (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Coreference as an indicator of context scope in multimodal narrative

Ilinykh, Nikolai, Lappin, Shalom, Sayeed, Asad, Loáiciga, Sharid

arXiv.org Artificial IntelligenceMar-7-2025

We demonstrate that large multimodal language models differ substantially from humans in the distribution of coreferential expressions in a visual storytelling task. We introduce a number of metrics to quantify the characteristics of coreferential patterns in both human- and machine-written texts. Humans distribute coreferential expressions in a way that maintains consistency across texts and images, interleaving references to different entities in a highly varied way. Machines are less able to track mixed references, despite achieving perceived improvements in generation quality.

computational linguistic, expression, sequence, (14 more...)

arXiv.org Artificial Intelligence

2503.05298

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
North America > United States > Pennsylvania (0.04)
(9 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.70)
Media (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Data-driven Coreference-based Ontology Building

Ashury-Tahan, Shir, Cohen, Amir David Nissan, Cohen, Nadav, Louzoun, Yoram, Goldberg, Yoav

arXiv.org Artificial IntelligenceOct-22-2024

While coreference resolution is traditionally used as a component in individual document understanding, in this work we take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations that are present in a large corpus. We derive coreference chains from a corpus of 30 million biomedical abstracts and construct a graph based on the string phrases within these chains, establishing connections between phrases if they co-occur within the same coreference chain. We then use the graph structure and the betweeness centrality measure to distinguish between edges denoting hierarchy, identity and noise, assign directionality to edges denoting hierarchy, and split nodes (strings) that correspond to multiple distinct concepts. The result is a rich, data-driven ontology over concepts in the biomedical domain, parts of which overlaps significantly with human-authored ontologies. We release the coreference chains and resulting ontology under a creative-commons license, along with the code.

artificial intelligence, node, ontology, (17 more...)

arXiv.org Artificial Intelligence

2410.17051

Country:

North America > United States > Illinois (0.04)
Europe > Monaco (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.94)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

Generating Visual Stories with Grounded and Coreferent Characters

Liu, Danyang, Lapata, Mirella, Keller, Frank

arXiv.org Artificial IntelligenceSep-20-2024

Characters are important in narratives. They move the plot forward, create emotional connections, and embody the story's themes. Visual storytelling methods focus more on the plot and events relating to it, without building the narrative around specific characters. As a result, the generated stories feel generic, with character mentions being absent, vague, or incorrect. To mitigate these issues, we introduce the new task of character-centric story generation and present the first model capable of predicting visual stories with consistently grounded and coreferent character mentions. Our model is finetuned on a new dataset which we build on top of the widely used VIST benchmark. Specifically, we develop an automated pipeline to enrich VIST with visual and textual character coreference chains. We also propose new evaluation metrics to measure the richness of characters and coreference in stories. Experimental results show that our model generates stories with recurring characters which are consistent and coreferent to larger extent compared to baselines and state-of-the-art systems.

annotation, coreference chain, dataset, (12 more...)

arXiv.org Artificial Intelligence

2409.13555

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > District of Columbia > Washington (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Education (1.00)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(5 more...)

Add feedback

Exploring Multiple Strategies to Improve Multilingual Coreference Resolution in CorefUD

Pražák, Ondřej, Konopík, Miloslav

arXiv.org Artificial IntelligenceAug-29-2024

Coreference resolution is the task of identifying language expressions that refer to the same real-world entity (antecedent) within a text. These coreferential expressions can sometimes appear within a single sentence, but often, they are spread across multiple sentences. In some challenging cases, it is necessary to consider the entire document to determine whether two expressions refer to the same entity. The task can be divided into two main subtasks: identifying entity mentions and grouping these mentions based on the real-world entities they refer to. Coreference resolution is closely related to anaphora resolution, as discussed in [2] Historically, coreference resolution was a standard preprocessing step in various natural language processing (NLP) tasks, such as machine translation, summarization, and information extraction. Although recent large language models have achieved state-of-the-art results in coreference resolution, they are expensive to train and deploy, and traditional (discriminative) approaches remain competitive. Expressing this task in natural language is challenging, and to the best of our knowledge, there have been no successful attempts to utilize large chatbots (like ChatGPT-4) to achieve superior results. Coreference resolution becomes particularly challenging in low-resource languages. One strategy to address this challenge is to train a multilingual model on datasets from multiple languages, thereby transferring knowledge from resource-rich languages to those with fewer resources.

coreference resolution, dataset, resolution, (14 more...)

arXiv.org Artificial Intelligence

2408.16893

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > Maryland > Howard County > Columbia (0.04)
(10 more...)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)

Add feedback

How to Evaluate Coreference in Literary Texts?

Duron-Tejedor, Ana-Isabel, Amsili, Pascal, Poibeau, Thierry

arXiv.org Artificial IntelligenceDec-30-2023

In this short paper, we examine the main metrics used to evaluate textual coreference and we detail some of their limitations. We show that a unique score cannot represent the full complexity of the problem at stake, and is thus uninformative, or even misleading. We propose a new way of evaluating coreference, taking into account the context (in our case, the analysis of fictions, esp. novels). More specifically, we propose to distinguish long coreference chains (corresponding to main characters), from short ones (corresponding to secondary characters), and singletons (isolated elements). This way, we hope to get more interpretable and thus more informative results through evaluation.

computational linguistic, coreference, singleton, (12 more...)

arXiv.org Artificial Intelligence

2401.00238

Country:

North America > United States > Maryland > Baltimore (0.05)
North America > United States > New York (0.04)
North America > United States > Maryland > Howard County > Columbia (0.04)
(6 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Towards Harmful Erotic Content Detection through Coreference-Driven Contextual Analysis

Okulska, Inez, Wiśnios, Emilia

arXiv.org Artificial IntelligenceOct-22-2023

Adult content detection still poses a great challenge for automation. Existing classifiers primarily focus on distinguishing between erotic and non-erotic texts. However, they often need more nuance in assessing the potential harm. Unfortunately, the content of this nature falls beyond the reach of generative models due to its potentially harmful nature. Ethical restrictions prohibit large language models (LLMs) from analyzing and classifying harmful erotics, let alone generating them to create synthetic datasets for other neural models. In such instances where data is scarce and challenging, a thorough analysis of the structure of such texts rather than a large model may offer a viable solution. Especially given that harmful erotic narratives, despite appearing similar to harmless ones, usually reveal their harmful nature first through contextual information hidden in the non-sexual parts of the narrative. This paper introduces a hybrid neural and rule-based context-aware system that leverages coreference resolution to identify harmful contextual cues in erotic content. Collaborating with professional moderators, we compiled a dataset and developed a classifier capable of distinguishing harmful from non-harmful erotic content. Our hybrid model, tested on Polish text, demonstrates a promising accuracy of 84% and a recall of 80%. Models based on RoBERTa and Longformer without explicit usage of coreference chains achieved significantly weaker results, underscoring the importance of coreference resolution in detecting such nuanced content as harmful erotics. This approach also offers the potential for enhanced visual explainability, supporting moderators in evaluating predictions and taking necessary actions to address harmful content.

coreference resolution, detection, sexual sentence, (15 more...)

arXiv.org Artificial Intelligence

2310.14325

Country:

Europe > Poland > Greater Poland Province > Poznań (0.04)
Europe > Ukraine (0.04)
Europe > Netherlands (0.04)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.46)

Add feedback

DialogRE^C+: An Extension of DialogRE to Investigate How Much Coreference Helps Relation Extraction in Dialogs

Xiong, Yiyun, Dai, Mengwei, Li, Fei, Fei, Hao, Li, Bobo, Wu, Shengqiong, Ji, Donghong, Teng, Chong

arXiv.org Artificial IntelligenceAug-12-2023

Dialogue relation extraction (DRE) that identifies the relations between argument pairs in dialogue text, suffers much from the frequent occurrence of personal pronouns, or entity and speaker coreference. This work introduces a new benchmark dataset DialogRE^C+, introducing coreference resolution into the DRE scenario. With the aid of high-quality coreference knowledge, the reasoning of argument relations is expected to be enhanced. In DialogRE^C+ dataset, we manually annotate total 5,068 coreference chains over 36,369 argument mentions based on the existing DialogRE data, where four different coreference chain types namely speaker chain, person chain, location chain and organization chain are explicitly marked. We further develop 4 coreference-enhanced graph-based DRE models, which learn effective coreference representations for improving the DRE task. We also train a coreference resolution model based on our annotations and evaluate the effect of automatically extracted coreference chains demonstrating the practicality of our dataset and its potential to other domains and tasks.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2308.04498

Country:

South America > Uruguay (0.05)
Asia > Singapore > Central Region > Singapore (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

BenCoref: A Multi-Domain Dataset of Nominal Phrases and Pronominal Reference Annotations

Rohan, Shadman, Hossain, Mojammel, Rashid, Mohammad Mamun Or, Mohammed, Nabeel

arXiv.org Artificial IntelligenceJul-3-2023

Coreference Resolution is a well studied problem in NLP. While widely studied for English and other resource-rich languages, research on coreference resolution in Bengali largely remains unexplored due to the absence of relevant datasets. Bengali, being a low-resource language, exhibits greater morphological richness compared to English. In this article, we introduce a new dataset, BenCoref, comprising coreference annotations for Bengali texts gathered from four distinct domains. This relatively small dataset contains 5200 mention annotations forming 502 mention clusters within 48,569 tokens. We describe the process of creating this dataset and report performance of multiple models trained using BenCoref. We expect that our work provides some valuable insights on the variations in coreference phenomena across several domains in Bengali and encourages the development of additional resources for Bengali. Furthermore, we found poor crosslingual performance at zero-shot setting from English, highlighting the need for more language-specific resources for this task.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2304.03682

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Parallel Data Helps Neural Entity Coreference Resolution

Tang, Gongbo, Hardmeier, Christian

arXiv.org Artificial IntelligenceMay-28-2023

Coreference resolution is the task of finding expressions that refer to the same entity in a text. Coreference models are generally trained on monolingual annotated data but annotating coreference is expensive and challenging. Hardmeier et al.(2013) have shown that parallel data contains latent anaphoric knowledge, but it has not been explored in end-to-end neural models yet. In this paper, we propose a simple yet effective model to exploit coreference knowledge from parallel data. In addition to the conventional modules learning coreference from annotations, we introduce an unsupervised module to capture cross-lingual coreference knowledge. Our proposed cross-lingual model achieves consistent improvements, up to 1.74 percentage points, on the OntoNotes 5.0 English dataset using 9 different synthetic parallel datasets. These experimental results confirm that parallel data can provide additional coreference knowledge which is beneficial to coreference resolution tasks.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.17709

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Beijing > Beijing (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(15 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)

Add feedback