ontonote
Efficient Seq2seq Coreference Resolution Using Entity Representations
Grenander, Matt, Cohen, Shay B., Steedman, Mark
Seq2seq coreference models have introduced a new paradigm for coreference resolution by learning to generate text corresponding to coreference labels, without requiring task-specific parameters. While these models achieve new state-of-the-art performance, they do so at the cost of flexibility and efficiency. In particular, they do not efficiently handle incremental settings such as dialogue, where text must processed sequentially. We propose a compressed representation in order to improve the efficiency of these methods in incremental settings. Our method works by extracting and re-organizing entity-level tokens, and discarding the majority of other input tokens. On OntoNotes, our best model achieves just 0.6 CoNLL F1 points below a full-prefix, incremental baseline while achieving a compression ratio of 1.8. On LitBank, where singleton mentions are annotated, it passes state-of-the-art performance. Our results indicate that discarding a wide portion of tokens in seq2seq resolvers is a feasible strategy for incremental coreference resolution.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Hong Kong (0.05)
- North America > Dominican Republic (0.04)
- (17 more...)
Improving LLMs' Learning for Coreference Resolution
Gan, Yujian, Liang, Yuan, Lin, Yanni, Yu, Juntao, Poesio, Massimo
Coreference Resolution (CR) is crucial for many NLP tasks, but existing LLMs struggle with hallucination and under-performance. In this paper, we investigate the limitations of existing LLM-based approaches to CR-specifically the Question-Answering (QA) Template and Document Template methods and propose two novel techniques: Reversed Training with Joint Inference and Iterative Document Generation. Our experiments show that Reversed Training improves the QA Template method, while Iterative Document Generation eliminates hallucinations in the generated source text and boosts coreference resolution. Integrating these methods and techniques offers an effective and robust solution to LLM-based coreference resolution.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (12 more...)
ESNERA: Empirical and semantic named entity alignment for named entity dataset merging
Zhang, Xiaobo, He, Congqing, He, Ying, Peng, Jian, Fu, Dajie, Tan, Tien-Ping
Named Entity Recognition (NER) is a fundamental task in natural language processing. It remains a research hotspot due to its wide applicability across domains. Although recent advances in deep learning have significantly improved NER performance, they rely heavily on large, high-quality annotated datasets. However, building these datasets is expensive and time-consuming, posing a major bottleneck for further research. Current dataset merging approaches mainly focus on strategies like manual label mapping or constructing label graphs, which lack interpretability and scalability. To address this, we propose an automatic label alignment method based on label similarity. The method combines empirical and semantic similarities, using a greedy pairwise merging strategy to unify label spaces across different datasets. Experiments are conducted in two stages: first, merging three existing NER datasets into a unified corpus with minimal impact on NER performance; second, integrating this corpus with a small-scale, self-built dataset in the financial domain. The results show that our method enables effective dataset merging and enhances NER performance in the low-resource financial domain. This study presents an efficient, interpretable, and scalable solution for integrating multi-source NER corpora.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (11 more...)
- Health & Medicine (1.00)
- Information Technology (0.67)
Konooz: Multi-domain Multi-dialect Corpus for Named Entity Recognition
Hamad, Nagham, Khalilia, Mohammed, Jarrar, Mustafa
We introduce Konooz, a novel multi-dimensional corpus covering 16 Arabic dialects across 10 domains, resulting in 160 distinct corpora. The corpus comprises about 777k tokens, carefully collected and manually annotated with 21 entity types using both nested and flat annotation schemes - using the Wojood guidelines. While Konooz is useful for various NLP tasks like domain adaptation and transfer learning, this paper primarily focuses on benchmarking existing Arabic Named Entity Recognition (NER) models, especially cross-domain and cross-dialect model performance. Our benchmarking of four Arabic NER models using Konooz reveals a significant drop in performance of up to 38% when compared to the in-distribution data. Furthermore, we present an in-depth analysis of domain and dialect divergence and the impact of resource scarcity. We also measured the overlap between domains and dialects using the Maximum Mean Discrepancy (MMD) metric, and illustrated why certain NER models perform better on specific dialects and domains. Konooz is open-source and publicly available at https://sina.birzeit.edu/wojood/#download
- Africa > Sudan (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- (25 more...)
An Overview and Discussion of the Suitability of Existing Speech Datasets to Train Machine Learning Models for Collective Problem Solving
Villuri, Gnaneswar, Doboli, Alex
This report characterized the suitability of existing datasets for devising new Machine Learning models, decision making methods, and analysis algorithms to improve Collaborative Problem Solving and then enumerated requirements for future datasets to be devised. Problem solving was assumed to be performed in teams of about three, four members, which talked to each other. A dataset consists of the speech recordings of such teams. The characterization methodology was based on metrics that capture cognitive, social, and emotional activities and situations. The report presented the analysis of a large group of datasets developed for Spoken Language Understanding, a research area with some similarity to Collaborative Problem Solving.
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
Solving the Challenge Set without Solving the Task: On Winograd Schemas as a Test of Pronominal Coreference Resolution
Porada, Ian, Cheung, Jackie Chi Kit
Challenge sets such as the Winograd Schema Challenge (WSC) are used to benchmark systems' ability to resolve ambiguities in natural language. If one assumes as in existing work that solving a given challenge set is at least as difficult as solving some more general task, then high performance on the challenge set should indicate high performance on the general task overall. However, we show empirically that this assumption of difficulty does not always hold. In particular, we demonstrate that despite the strong performance of prompted language models (LMs) on the WSC and its variants, these same modeling techniques perform relatively poorly at resolving certain pronominal ambiguities attested in OntoNotes and related datasets that are perceived to be easier. Motivated by these findings, we propose a method for ensembling a prompted LM with a supervised, task-specific system that is overall more accurate at resolving pronominal coreference across datasets. Finally, we emphasize that datasets involving the same linguistic phenomenon draw on distinct, but overlapping, capabilities, and evaluating on any one dataset alone does not provide a complete picture of a system's overall capability.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Tuscany > Florence (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- (29 more...)
- Education (0.67)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)
Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends
Martinelli, Giuliano, Barba, Edoardo, Navigli, Roberto
Large autoregressive generative models have emerged as the cornerstone for achieving the highest performance across several Natural Language Processing tasks. However, the urge to attain superior results has, at times, led to the premature replacement of carefully designed task-specific approaches without exhaustive experimentation. The Coreference Resolution task is no exception; all recent state-of-the-art solutions adopt large generative autoregressive models that outperform encoder-based discriminative systems. In this work,we challenge this recent trend by introducing Maverick, a carefully designed - yet simple - pipeline, which enables running a state-of-the-art Coreference Resolution system within the constraints of an academic budget, outperforming models with up to 13 billion parameters with as few as 500 million parameters. Maverick achieves state-of-the-art performance on the CoNLL-2012 benchmark, training with up to 0.006x the memory resources and obtaining a 170x faster inference compared to previous state-of-the-art systems. We extensively validate the robustness of the Maverick framework with an array of diverse experiments, reporting improvements over prior systems in data-scarce, long-document, and out-of-domain settings. We release our code and models for research purposes at https://github.com/SapienzaNLP/maverick-coref.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > China > Hong Kong (0.04)
- (18 more...)
- Law (0.68)
- Education (0.67)
- Government (0.46)
Description Boosting for Zero-Shot Entity and Relation Classification
Picco, Gabriele, Fuchs, Leopold, Galindo, Marcos Martínez, Purpura, Alberto, López, Vanessa, Lam, Hoang Thanh
For entity recognition - including classification Named Entity Recognition (NER) and Relation and linking - and relation classification problems, Extraction (RE) allow for the extraction and categorization recent ZSL methods (Aly et al., 2021; Ledell Wu, of structured data from unstructured 2020; Chen and Li, 2021) rely on textual descriptions text, which in turn enables not only more accurate of entities or relations. Descriptions provide entity recognition and relationship extraction, but the required information about the semantics of entities also getting data from several unstructured sources, (or relations), which help the models to identify helping to build knowledge graphs and the semantic entity mentions in texts without observing them web. However, these methods usually rely on during training. Works such as (Ledell Wu, 2020; labeled data (usually human-annotated data) for a De Cao et al., 2021) and (Aly et al., 2021) show good performance, usually requiring domain experts how effective it is to use textual descriptions to perform for data acquisition and labeling, which may entity recognition tasks in the zero-shot context.
- Europe > Middle East (0.04)
- Africa > Middle East (0.04)
- North America > United States > Massachusetts (0.04)
- (19 more...)
- Media (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine (0.93)
A Controlled Reevaluation of Coreference Resolution Models
Porada, Ian, Zou, Xiyuan, Cheung, Jackie Chi Kit
All state-of-the-art coreference resolution (CR) models involve finetuning a pretrained language model. Whether the superior performance of one CR model over another is due to the choice of language model or other factors, such as the task-specific architecture, is difficult or impossible to determine due to lack of a standardized experimental setup. To resolve this ambiguity, we systematically evaluate five CR models and control for certain design decisions including the pretrained language model used by each. When controlling for language model size, encoder-based CR models outperform more recent decoder-based models in terms of both accuracy and inference speed. Surprisingly, among encoder-based CR models, more recent models are not always more accurate, and the oldest CR model that we test generalizes the best to out-of-domain textual genres. We conclude that controlling for the choice of language model reduces most, but not all, of the increase in F1 score reported in the past five years.
- North America > Dominican Republic (0.04)
- North America > United States > New York (0.04)
- North America > United States > Maryland > Howard County > Columbia (0.04)
- (9 more...)
Do "English" Named Entity Recognizers Work Well on Global Englishes?
Shan, Alexander, Bauer, John, Carlson, Riley, Manning, Christopher
The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally. To test this, we build a newswire dataset, the Worldwide English NER Dataset, to analyze NER model performance on low-resource English variants from around the world. We test widely used NER toolkits and transformer models, including models using the pre-trained contextual models RoBERTa and ELECTRA, on three datasets: a commonly used British English newswire dataset, CoNLL 2003, a more American focused dataset OntoNotes, and our global dataset. All models trained on the CoNLL or OntoNotes datasets experienced significant performance drops-over 10 F1 in some cases-when tested on the Worldwide English dataset. Upon examination of region-specific errors, we observe the greatest performance drops for Oceania and Africa, while Asia and the Middle East had comparatively strong performance. Lastly, we find that a combined model trained on the Worldwide dataset and either CoNLL or OntoNotes lost only 1-2 F1 on both test sets.
- Europe > Middle East (0.24)
- Africa > Middle East (0.24)
- Oceania (0.24)
- (14 more...)
- Media (0.68)
- Government (0.68)
- Leisure & Entertainment (0.46)
- (2 more...)