entity and relation extraction
DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations
Popovič, Nicholas, Kangen, Ashish, Schopf, Tim, Färber, Michael
Large, high-quality annotated corpora remain scarce in document-level entity and relation extraction in zero-shot or few-shot settings. In this paper, we present a fully automatic, LLM-based pipeline for synthetic data generation and in-context learning for document-level entity and relation extraction. In contrast to existing approaches that rely on manually annotated demonstrations or direct zero-shot inference, our method combines synthetic data generation with retrieval-based in-context learning, using a reasoning-optimized language model. This allows us to build a high-quality demonstration database without manual annotation and to dynamically retrieve relevant examples at inference time. Based on our approach we produce a synthetic dataset of over $5k$ Wikipedia abstracts with approximately $59k$ entities and $30k$ relation triples. Finally, we evaluate in-context learning performance on the DocIE shared task, extracting entities and relations from long documents in a zero-shot setting. We find that in-context joint entity and relation extraction at document-level remains a challenging task, even for state-of-the-art large language models.
- Research Report (0.82)
- Personal > Honors (0.47)
The Joint Entity-Relation Extraction Model Based on Span and Interactive Fusion Representation for Chinese Medical Texts with Complex Semantics
Feng, Danni, Li, Runzhi, Wang, Jing, Yan, Siyu, Ma, Lihong, Xing, Yunli
Joint entity-relation extraction is a critical task in transforming unstructured or semi-structured text into triplets, facilitating the construction of large-scale knowledge graphs, and supporting various downstream applications. Despite its importance, research on Chinese text, particularly with complex semantics in specialized domains like medicine, remains limited. To address this gap, we introduce the CH-DDI, a Chinese drug-drug interactions dataset designed to capture the intricacies of medical text. Leveraging the strengths of attention mechanisms in capturing long-range dependencies, we propose the SEA module, which enhances the extraction of complex contextual semantic information, thereby improving entity recognition and relation extraction. Additionally, to address the inefficiencies of existing methods in facilitating information exchange between entity recognition and relation extraction, we present an interactive fusion representation module. This module employs Cross Attention for bidirectional information exchange between the tasks and further refines feature extraction through BiLSTM. Experimental results on both our CH-DDI dataset and public CoNLL04 dataset demonstrate that our model exhibits strong generalization capabilities. On the CH-DDI dataset, our model achieves an F1-score of 96.73% for entity recognition and 78.43% for relation extraction.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Henan Province > Zhengzhou (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (4 more...)
ADEQA: A Question Answer based approach for joint ADE-Suspect Extraction using Sequence-To-Sequence Transformers
Arannil, Vinayak, Deb, Tomal, Roy, Atanu
Early identification of Adverse Drug Events (ADE) is critical for taking prompt actions while introducing new drugs into the market. These ADEs information are available through various unstructured data sources like clinical study reports, patient health records, social media posts, etc. Extracting ADEs and the related suspect drugs using machine learning is a challenging task due to the complex linguistic relations between drug ADE pairs in textual data and unavailability of large corpus of labelled datasets. This paper introduces ADEQA, a question-answer(QA) based approach using quasi supervised labelled data and sequence-to-sequence transformers to extract ADEs, drug suspects and the relationships between them. Unlike traditional QA models, natural language generation (NLG) based models don't require extensive token level labelling and thereby reduces the adoption barrier significantly. On a public ADE corpus, we were able to achieve state-of-the-art results with an F1 score of 94% on establishing the relationships between ADEs and the respective suspects.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- North America > Dominican Republic (0.04)
- (3 more...)
Generalized knowledge-enhanced framework for biomedical entity and relation extraction
In recent years, there has been an increasing number of frameworks developed for biomedical entity and relation extraction. This research effort aims to address the accelerating growth in biomedical publications and the intricate nature of biomedical texts, which are written for mainly domain experts. To handle these challenges, we develop a novel framework that utilizes external knowledge to construct a task-independent and reusable background knowledge graph for biomedical entity and relation extraction. The design of our model is inspired by how humans learn domain-specific topics. In particular, humans often first acquire the most basic and common knowledge regarding a field to build the foundational knowledge and then use that as a basis for extending to various specialized topics. Our framework employs such common-knowledge-sharing mechanism to build a general neural-network knowledge graph that is learning transferable to different domain-specific biomedical texts effectively. Experimental evaluations demonstrate that our model, equipped with this generalized and cross-transferable knowledge base, achieves competitive performance benchmarks, including BioRelEx for binding interaction detection and ADE for Adverse Drug Effect identification.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction
Zhou, Yang, Shan, Shimin, Wei, Hongkui, Zhao, Zhehuan, Feng, Wenshuo
Relation Extraction (RE) aims at recognizing the relation between pairs of entities mentioned in a text. Advances in LLMs have had a tremendous impact on NLP. In this work, we propose a textual data augmentation framework called PGA for improving the performance of models for RE in the scientific domain. The framework introduces two ways of data augmentation, utilizing a LLM to obtain pseudo-samples with the same sentence meaning but with different representations and forms by paraphrasing the original training set samples. As well as instructing LLM to generate sentences that implicitly contain information about the corresponding labels based on the relation and entity of the original training set samples. These two kinds of pseudo-samples participate in the training of the RE model together with the original dataset, respectively. The PGA framework in the experiment improves the F1 scores of the three mainstream models for RE within the scientific domain. Also, using a LLM to obtain samples can effectively reduce the cost of manually labeling data.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Singapore (0.05)
- North America > Canada > Ontario > Toronto (0.04)
- (8 more...)
GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction
Zaratiana, Urchade, Tomeh, Nadi, Khbir, Niama El, Holat, Pierre, Charnois, Thierry
Information extraction (IE) is an important task in Natural Language Processing (NLP), involving the extraction of named entities and their relationships from unstructured text. In this paper, we propose a novel approach to this task by formulating it as graph structure learning (GSL). By formulating IE as GSL, we enhance the model's ability to dynamically refine and optimize the graph structure during the extraction process. This formulation allows for better interaction and structure-informed decisions for entity and relation prediction, in contrast to previous models that have separate or untied predictions for these tasks. When compared against state-of-the-art baselines on joint entity and relation extraction benchmarks, our model, GraphER, achieves competitive results.
- North America > United States > California > San Francisco County > San Francisco (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > New York (0.04)
- (15 more...)
- Research Report > Promising Solution (0.34)
- Overview > Innovation (0.34)
EnriCo: Enriched Representation and Globally Constrained Inference for Entity and Relation Extraction
Zaratiana, Urchade, Tomeh, Nadi, Dauxais, Yann, Holat, Pierre, Charnois, Thierry
Joint entity and relation extraction plays a pivotal role in various applications, notably in the construction of knowledge graphs. Despite recent progress, existing approaches often fall short in two key aspects: richness of representation and coherence in output structure. These models often rely on handcrafted heuristics for computing entity and relation representations, potentially leading to loss of crucial information. Furthermore, they disregard task and/or dataset-specific constraints, resulting in output structures that lack coherence. In our work, we introduce EnriCo, which mitigates these shortcomings. Firstly, to foster rich and expressive representation, our model leverage attention mechanisms that allow both entities and relations to dynamically determine the pertinent information required for accurate extraction. Secondly, we introduce a series of decoding algorithms designed to infer the highest scoring solutions while adhering to task and dataset-specific constraints, thus promoting structured and coherent outputs. Our model demonstrates competitive performance compared to baselines when evaluated on Joint IE datasets.
- North America > Canada > Quebec > Montreal (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (8 more...)
AutoRD: An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models
Cao, Lang, Sun, Jimeng, Cross, Adam
Objectives: Our objective is to create an end-to-end system called AutoRD, which automates extracting information from clinical text about rare diseases. We have conducted various tests to evaluate the performance of AutoRD and highlighted its strengths and limitations in this paper. Materials and Methods: Our system, AutoRD, is a software pipeline involving data preprocessing, entity extraction, relation extraction, entity calibration, and knowledge graph construction. We implement this using large language models and medical knowledge graphs developed from open-source medical ontologies. We quantitatively evaluate our system on entity extraction, relation extraction, and the performance of knowledge graph construction. Results: AutoRD achieves an overall F1 score of 47.3%, a 14.4% improvement compared to the base LLM. In detail, AutoRD achieves an overall entity extraction F1 score of 56.1% (rare_disease: 83.5%, disease: 35.8%, symptom_and_sign: 46.1%, anaphor: 67.5%) and an overall relation extraction F1 score of 38.6% (produces: 34.7%, increases_risk_of: 12.4%, is_a: 37.4%, is_acronym: 44.1%, is_synonym: 16.3%, anaphora: 57.5%). Our qualitative experiment also demonstrates that the performance in constructing the knowledge graph is commendable. Discussion: AutoRD demonstrates the potential of LLM applications in rare disease detection. This improvement is attributed to several design, including the integration of ontologies-enhanced LLMs. Conclusion: AutoRD is an automated end-to-end system for extracting rare disease information from text to build knowledge graphs. It uses ontologies-enhanced LLMs for a robust medical knowledge base. The superior performance of AutoRD is validated by experimental evaluations, demonstrating the potential of LLMs in healthcare.
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- North America > United States > Illinois > Peoria County > Peoria (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Asia > Middle East > Bahrain (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
IPED: An Implicit Perspective for Relational Triple Extraction based on Diffusion Model
Zhao, Jianli, Xu, Changhao, Jiang, Bin
Relational triple extraction is a fundamental task in the field of information extraction, and a promising framework based on table filling has recently gained attention as a potential baseline for entity relation extraction. However, inherent shortcomings such as redundant information and incomplete triple recognition remain problematic. To address these challenges, we propose an Implicit Perspective for relational triple Extraction based on Diffusion model (IPED), an innovative approach for extracting relational triples. Our classifier-free solution adopts an implicit strategy using block coverage to complete the tables, avoiding the limitations of explicit tagging methods. Additionally, we introduce a generative model structure, the block-denoising diffusion model, to collaborate with our implicit perspective and effectively circumvent redundant information disruptions. Experimental results on two popular datasets demonstrate that IPED achieves state-of-the-art performance while gaining superior inference speed and low computational complexity. To support future research, we have made our source code publicly available online.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Mexico > Mexico City > Mexico City (0.05)
- North America > Canada > Ontario > Toronto (0.05)
- (11 more...)
An Autoregressive Text-to-Graph Framework for Joint Entity and Relation Extraction
Zaratiana, Urchade, Tomeh, Nadi, Holat, Pierre, Charnois, Thierry
In this paper, we propose a novel method for joint entity and relation extraction from unstructured text by framing it as a conditional sequence generation problem. In contrast to conventional generative information extraction models that are left-to-right token-level generators, our approach is \textit{span-based}. It generates a linearized graph where nodes represent text spans and edges represent relation triplets. Our method employs a transformer encoder-decoder architecture with pointing mechanism on a dynamic vocabulary of spans and relation types. Our model can capture the structural characteristics and boundaries of entities and relations through span representations while simultaneously grounding the generated output in the original text thanks to the pointing mechanism. Evaluation on benchmark datasets validates the effectiveness of our approach, demonstrating competitive results. Code is available at https://github.com/urchade/ATG.
- North America > Canada > Quebec > Montreal (0.15)
- Europe > France (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia (0.04)