Goto

Collaborating Authors

 anaphor


DiscoTrack: A Multilingual LLM Benchmark for Discourse Tracking

Bu, Lanni, Levine, Lauren, Zeldes, Amir

arXiv.org Artificial Intelligence

Recent LLM benchmarks have tested models on a range of phenomena, but are still focused primarily on natural language understanding for extraction of explicit information, such as QA or summarization, with responses often targeting information from individual sentences. We are still lacking more challenging, and importantly also multilingual, benchmarks focusing on implicit information and pragmatic inferences across larger documents in the context of discourse tracking: integrating and aggregating information across sentences, paragraphs and multiple speaker utterances. To this end, we present DiscoTrack, an LLM benchmark targeting a range of tasks across 12 languages and four levels of discourse understanding: salience recognition, entity tracking, discourse relations and bridging inference. Our evaluation shows that these tasks remain challenging, even for state-of-the-art models.


Subjectivity in the Annotation of Bridging Anaphora

Levine, Lauren, Zeldes, Amir

arXiv.org Artificial Intelligence

Bridging refers to the associative relationship between inferable entities in a discourse and the antecedents which allow us to understand them, such as understanding what "the door" means with respect to an aforementioned "house". As identifying associative relations between entities is an inherently subjective task, it is difficult to achieve consistent agreement in the annotation of bridging anaphora and their antecedents. In this paper, we explore the subjectivity involved in the annotation of bridging instances at three levels: anaphor recognition, antecedent resolution, and bridging subtype selection. To do this, we conduct an annotation pilot on the test set of the existing GUM corpus, and propose a newly developed classification system for bridging subtypes, which we compare to previously proposed schemes. Our results suggest that some previous resources are likely to be severely under-annotated. We also find that while agreement on the bridging subtype category was moderate, annotator overlap for exhaustively identifying instances of bridging is low, and that many disagreements resulted from subjective understanding of the entities involved.


Unifying the Scope of Bridging Anaphora Types in English: Bridging Annotations in ARRAU and GUM

Levine, Lauren, Zeldes, Amir

arXiv.org Artificial Intelligence

Comparing bridging annotations across coreference resources is difficult, largely due to a lack of standardization across definitions and annotation schemas and narrow coverage of disparate text domains across resources. To alleviate domain coverage issues and consolidate schemas, we compare guidelines and use interpretable predictive models to examine the bridging instances annotated in the GUM, GENTLE and ARRAU corpora. Examining these cases, we find that there is a large difference in types of phenomena annotated as bridging. Beyond theoretical results, we release a harmonized, subcategorized version of the test sets of GUM, GENTLE and the ARRAU Wall Street Journal data to promote meaningful and reliable evaluation of bridging resolution across domains.


Anaphor Assisted Document-Level Relation Extraction

Lu, Chonggang, Zhang, Richong, Sun, Kai, Kim, Jaein, Zhang, Cunwang, Mao, Yongyi

arXiv.org Artificial Intelligence

Document-level relation extraction (DocRE) involves identifying relations between entities distributed in multiple sentences within a document. Existing methods focus on building a heterogeneous document graph to model the internal structure of an entity and the external interaction between entities. However, there are two drawbacks in existing methods. On one hand, anaphor plays an important role in reasoning to identify relations between entities but is ignored by these methods. On the other hand, these methods achieve cross-sentence entity interactions implicitly by utilizing a document or sentences as intermediate nodes. Such an approach has difficulties in learning fine-grained interactions between entities across different sentences, resulting in sub-optimal performance. To address these issues, we propose an Anaphor-Assisted (AA) framework for DocRE tasks. Experimental results on the widely-used datasets demonstrate that our model achieves a new state-of-the-art performance.


End-to-End Neural Discourse Deixis Resolution in Dialogue

Li, Shengjie, Ng, Vincent

arXiv.org Artificial Intelligence

We adapt Lee et al.'s (2018) span-based entity coreference model to the task of end-to-end discourse deixis resolution in dialogue, specifically by proposing extensions to their model that exploit task-specific characteristics. The resulting model, dd-utt, achieves state-of-the-art results on the four datasets in the CODI-CRAC 2021 shared task.


A Mention-Ranking Model for Abstract Anaphora Resolution

Marasović, Ana, Born, Leo, Opitz, Juri, Frank, Anette

arXiv.org Machine Learning

Resolving abstract anaphora is an important, but difficult task for text understanding. Yet, with recent advances in representation learning this task becomes a more tangible aim. A central property of abstract anaphora is that it establishes a relation between the anaphor embedded in the anaphoric sentence and its (typically non-nominal) antecedent. We propose a mention-ranking model that learns how abstract anaphors relate to their antecedents with an LSTM-Siamese Net. We overcome the lack of training data by generating artificial anaphoric sentence--antecedent pairs. Our model outperforms state-of-the-art results on shell noun resolution. We also report first benchmark results on an abstract anaphora subset of the ARRAU corpus. This corpus presents a greater challenge due to a mixture of nominal and pronominal anaphors and a greater range of confounders. We found model variants that outperform the baselines for nominal anaphors, without training on individual anaphor data, but still lag behind for pronominal anaphors. Our model selects syntactically plausible candidates and -- if disregarding syntax -- discriminates candidates using deeper features.


What sort of taxonomy of causation do we need for language understanding?

Wilks, Y. A.

Classics

This paper describes an investigation of the feasibility of resolving anaphors in natural language texts by means of a'shallow processing' approach which exploits knowledge of syntax, semantics and local focussing as heavily as possible; it does not rely on the presence of large amounts of world or domain knowledge, which are notoriously hard to process accurately. The ideas reported are implemented in a program called SPAR (Shallow Processing Anaphor Resolver), which resolves anaphoric ambiguities in simple English stories and generates sentence-by-sentence paraphrases that show what interpretations have been selected. To resolve anaphors, SPAR combines and develops several existing techniques, most notably Sidner's theory of local focussing and Wilks' 'preference semantics' theory of semantics and common sense inference Consideration of the need to resolve several anaphors in the same sentence results in Sidner's framework being modified and extended to allow focus-based processing to interact more flexibly with processing based on other types of knowledge. Wilks' treatment of common sense inference is extended to incorporate a wider range of types of inference without jeopardizing its uniformity and simplicity. In the absence of large quantities of world knowledge, successful anaphor resolution is seen to depend on the coordination of predictions made by system components exploiting various knowledge sources.