Goto

Collaborating Authors

 tacred


Post-Training Language Models for Continual Relation Extraction

arXiv.org Artificial Intelligence

Real-world data, such as news articles, social media posts, and chatbot conversations, is inherently dynamic and non-s tationary, presenting significant challenges for constructing real-t ime structured representations through knowledge graphs (KGs). Relation Extraction (RE), a fundamental component of KG creation, often struggl es to adapt to evolving data when traditional models rely on static, out dated datasets. Continual Relation Extraction (CRE) methods tackle this is sue by in-crementally learning new relations while preserving previ ously acquired knowledge. This study investigates the application of pre-trained language models (PLMs), specifically large language models (LL Ms), to CRE, with a focus on leveraging memory replay to address cata strophic forgetting. We evaluate decoder-only models (eg, Mistral-7B and Llama2-7B) and encoder-decoder models (eg, Flan-T5 Base) on the TAC RED and FewRel datasets. Task-incremental fine-tuning of LLMs d emonstrates superior performance over earlier approaches using encode r-only models like BERT on TACRED, excelling in seen-task accuracy and overall performance (measured by whole and average accuracy), part icularly with the Mistral and Flan-T5 models. Results on FewRel are si milarly promising, achieving second place in whole and average accu racy metrics. This work underscores critical factors in knowledge transf er, language model architecture, and KG completeness, advancing CRE wit h LLMs and memory replay for dynamic, real-time relation extracti on.


Reinforced Interactive Continual Learning via Real-time Noisy Human Feedback

arXiv.org Artificial Intelligence

This paper introduces an interactive continual learning paradigm where AI models dynamically learn new skills from real-time human feedback while retaining prior knowledge. This paradigm distinctively addresses two major limitations of traditional continual learning: (1) dynamic model updates using streaming, real-time human-annotated data, rather than static datasets with fixed labels, and (2) the assumption of clean labels, by explicitly handling the noisy feedback common in real-world interactions. To tackle these problems, we propose RiCL, a Reinforced interactive Continual Learning framework leveraging Large Language Models (LLMs) to learn new skills effectively from dynamic feedback. RiCL incorporates three key components: a temporal consistency-aware purifier to automatically discern clean from noisy samples in data streams; an interaction-aware direct preference optimization strategy to align model behavior with human intent by reconciling AI-generated and human-provided feedback; and a noise-resistant contrastive learning module that captures robust representations by exploiting inherent data relationships, thus avoiding reliance on potentially unreliable labels. Extensive experiments on two benchmark datasets (FewRel and TACRED), contaminated with realistic noise patterns, demonstrate that our RiCL approach substantially outperforms existing combinations of state-of-the-art online continual learning and noisy-label learning methods.


Beyond the Numbers: Transparency in Relation Extraction Benchmark Creation and Leaderboards

arXiv.org Artificial Intelligence

This paper investigates the transparency in the creation of benchmarks and the use of leaderboards for measuring progress in NLP, with a focus on the relation extraction (RE) task. Existing RE benchmarks often suffer from insufficient documentation, lacking crucial details such as data sources, inter-annotator agreement, the algorithms used for the selection of instances for datasets, and information on potential biases like dataset imbalance. Progress in RE is frequently measured by leaderboards that rank systems based on evaluation methods, typically limited to aggregate metrics like F1-score. However, the absence of detailed performance analysis beyond these metrics can obscure the true generalisation capabilities of models. Our analysis reveals that widely used RE benchmarks, such as TACRED and NYT, tend to be highly imbalanced and contain noisy labels. Moreover, the lack of class-based performance metrics fails to accurately reflect model performance across datasets with a large number of relation types. These limitations should be carefully considered when reporting progress in RE. While our discussion centers on the transparency of RE benchmarks and leaderboards, the observations we discuss are broadly applicable to other NLP tasks as well. Rather than undermining the significance and value of existing RE benchmarks and the development of new models, this paper advocates for improved documentation and more rigorous evaluation to advance the field.


Preserving Generalization of Language models in Few-shot Continual Relation Extraction

arXiv.org Artificial Intelligence

Few-shot Continual Relations Extraction (FCRE) is an emerging and dynamic area of study where models can sequentially integrate knowledge from new relations with limited labeled data while circumventing catastrophic forgetting and preserving prior knowledge from pre-trained backbones. In this work, we introduce a novel method that leverages often-discarded language model heads. By employing these components via a mutual information maximization strategy, our approach helps maintain prior knowledge from the pre-trained backbone and strategically aligns the primary classification head, thereby enhancing model performance. Furthermore, we explore the potential of Large Language Models (LLMs), renowned for their wealth of knowledge, in addressing FCRE challenges. Our comprehensive experimental results underscore the efficacy of the proposed method and offer valuable insights for future work.


Towards Realistic Few-Shot Relation Extraction: A New Meta Dataset and Evaluation

arXiv.org Artificial Intelligence

We introduce a meta dataset for few-shot relation extraction, which includes two datasets derived from existing supervised relation extraction datasets - NYT29 (Takanobu et al., 2019; Nayak and Ng, 2020) and WIKI-DATA (Sorokin and Gurevych, 2017) - as well as a few-shot form of the TACRED dataset (Sabo et al., 2021). Importantly, all these few-shot datasets were generated under realistic assumptions such as: the test relations are different from any relations a model might have seen before, limited training data, and a preponderance of candidate relation mentions that do not correspond to any of the relations of interest. Using this large resource, we conduct a comprehensive evaluation of six recent few-shot relation extraction methods, and observe that no method comes out as a clear winner. Further, the overall performance on this task is low, indicating substantial need for future research. We release all versions of the data, i.e., both supervised and few-shot, for future research.


Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors

arXiv.org Artificial Intelligence

Continual Few-shot Relation Extraction (CFRE) is a practical problem that requires the model to continuously learn novel relations while avoiding forgetting old ones with few labeled training data. The primary challenges are catastrophic forgetting and overfitting. This paper harnesses prompt learning to explore the implicit capabilities of pre-trained language models to address the above two challenges, thereby making language models better continual few-shot relation extractors. Specifically, we propose a Contrastive Prompt Learning framework, which designs prompt representation to acquire more generalized knowledge that can be easily adapted to old and new categories, and margin-based contrastive learning to focus more on hard samples, therefore alleviating catastrophic forgetting and overfitting issues. To further remedy overfitting in low-resource scenarios, we introduce an effective memory augmentation strategy that employs well-crafted prompts to guide ChatGPT in generating diverse samples. Extensive experiments demonstrate that our method outperforms state-of-the-art methods by a large margin and significantly mitigates catastrophic forgetting and overfitting in low-resource scenarios.


Revisiting Large Language Models as Zero-shot Relation Extractors

arXiv.org Artificial Intelligence

Relation extraction (RE) consistently involves a certain degree of labeled or unlabeled data even if under zero-shot setting. Recent studies have shown that large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt, which provides the possibility of extracting relations from text without any data and parameter tuning. This work focuses on the study of exploring LLMs, such as ChatGPT, as zero-shot relation extractors. On the one hand, we analyze the drawbacks of existing RE prompts and attempt to incorporate recent prompt techniques such as chain-of-thought (CoT) to improve zero-shot RE. We propose the summarize-and-ask (\textsc{SumAsk}) prompting, a simple prompt recursively using LLMs to transform RE inputs to the effective question answering (QA) format. On the other hand, we conduct comprehensive experiments on various benchmarks and settings to investigate the capabilities of LLMs on zero-shot RE. Specifically, we have the following findings: (i) \textsc{SumAsk} consistently and significantly improves LLMs performance on different model sizes, benchmarks and settings; (ii) Zero-shot prompting with ChatGPT achieves competitive or superior results compared with zero-shot and fully supervised methods; (iii) LLMs deliver promising performance in extracting overlapping relations; (iv) The performance varies greatly regarding different relations. Different from small language models, LLMs are effective in handling challenge none-of-the-above (NoTA) relation.


Noise in Relation Classification Dataset TACRED: Characterization and Reduction

arXiv.org Artificial Intelligence

The overarching objective of this paper is two-fold. First, to explore model-based approaches to characterize the primary cause of the noise. in the RE dataset TACRED Second, to identify the potentially noisy instances. Towards the first objective, we analyze predictions and performance of state-of-the-art (SOTA) models to identify the root cause of noise in the dataset. Our analysis of TACRED shows that the majority of the noise in the dataset originates from the instances labeled as no-relation which are negative examples. For the second objective, we explore two nearest-neighbor-based strategies to automatically identify potentially noisy examples for elimination and reannotation. Our first strategy, referred to as Intrinsic Strategy (IS), is based on the assumption that positive examples are clean. Thus, we have used false-negative predictions to identify noisy negative examples. Whereas, our second approach, referred to as Extrinsic Strategy, is based on using a clean subset of the dataset to identify potentially noisy negative examples. Finally, we retrained the SOTA models on the eliminated and reannotated dataset. Our empirical results based on two SOTA models trained on TACRED-E following the IS show an average 4% F1-score improvement, whereas reannotation (TACRED-R) does not improve the original results. However, following ES, SOTA models show the average F1-score improvement of 3.8% and 4.4% when trained on respective eliminated (TACRED-EN) and reannotated (TACRED-RN) datasets respectively. We further extended the ES for cleaning positive examples as well, which resulted in an average performance improvement of 5.8% and 5.6% for the eliminated (TACRED-ENP) and reannotated (TACRED-RNP) datasets respectively.


How Fragile is Relation Extraction under Entity Replacements?

arXiv.org Artificial Intelligence

Relation extraction (RE) aims to extract the relations between entity names from the textual context. In principle, textual context determines the ground-truth relation and the RE models should be able to correctly identify the relations reflected by the textual context. However, existing work has found that the RE models memorize the entity name patterns to make RE predictions while ignoring the textual context. This motivates us to raise the question: ``are RE models robust to the entity replacements?'' In this work, we operate the random and type-constrained entity replacements over the RE instances in TACRED and evaluate the state-of-the-art RE models under the entity replacements. We observe the 30\% - 50\% F1 score drops on the state-of-the-art RE models under entity replacements. These results suggest that we need more efforts to develop effective RE models robust to entity replacements. We release the source code at https://github.com/wangywUST/RobustRE.


Serial Contrastive Knowledge Distillation for Continual Few-shot Relation Extraction

arXiv.org Artificial Intelligence

Therefore, the continual Heist and Paulheim, 2017; Zhang et al., 2018) few-shot RE paradigm (Qin and Joty, 2022) mainly assume a fixed pre-defined relation set and was proposed to simulate real human learning scenarios, train on a fixed dataset. However, they cannot work where new knowledge can be acquired from well with the new relations that continue emerging a small number of new samples. As illustrated in in some real-world scenarios of RE. Continual Figure 1, the continual few-shot RE paradigm expects RE (Wang et al., 2019; Han et al., 2020; Wu et al., the model to continuously learn new relations 2021) was proposed as a new paradigm to solve through abundant training data only for the first this situation, which applies the idea of continual task, but through sparse training data for all subsequent learning (Parisi et al., 2019) to the field of RE. tasks. Thus, the model needs to identify Compared with conventional RE, continual RE the growing relations well with few labeled data is more challenging. It requires the model to learn for them while retaining the knowledge on old relations emerging relations while maintaining a stable and without re-training from scratch. As relations accurate classification of old relations, i.e., the socalled grow, the confusion about relation representations catastrophic forgetting problem (Thrun and leads to catastrophic forgetting.