AITopics | entity and relation extraction

Collaborating Authors

entity and relation extraction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SciNLP: A Domain-Specific Benchmark for Full-Text Scientific Entity and Relation Extraction in NLP

Duan, Decheng, Zhang, Yingyi, Peng, Jitong, Zhang, Chengzhi

arXiv.org Artificial IntelligenceSep-23-2025

Structured information extraction from scientific literature is crucial for capturing core concepts and emerging trends in specialized fields. While existing datasets aid model development, most focus on specific publication sections due to domain complexity and the high cost of annotating scientific texts. To address this limitation, we introduce SciNLP--a specialized benchmark for full-text entity and relation extraction in the Natural Language Processing (NLP) domain. The dataset comprises 60 manually annotated full-text NLP publications, covering 7,072 entities and 1,826 relations. Compared to existing research, SciNLP is the first dataset providing full-text annotations of entities and their relationships in the NLP domain. To validate the effectiveness of SciNLP, we conducted comparative experiments with similar datasets and evaluated the performance of state-of-the-art supervised models on this dataset. Results reveal varying extraction capabilities of existing models across academic texts of different lengths. Cross-comparisons with existing datasets show that SciNLP achieves significant performance improvements on certain baseline models. Using models trained on SciNLP, we implemented automatic construction of a fine-grained knowledge graph for the NLP domain. Our KG has an average node degree of 3.2 per entity, indicating rich semantic topological information that enhances downstream applications. The dataset is publicly available at: https://github.com/AKADDC/SciNLP.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.07801

Country:

Asia (1.00)
North America > United States (0.93)
Europe (0.93)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations

Popovič, Nicholas, Kangen, Ashish, Schopf, Tim, Färber, Michael

arXiv.org Artificial IntelligenceJul-9-2025

Large, high-quality annotated corpora remain scarce in document-level entity and relation extraction in zero-shot or few-shot settings. In this paper, we present a fully automatic, LLM-based pipeline for synthetic data generation and in-context learning for document-level entity and relation extraction. In contrast to existing approaches that rely on manually annotated demonstrations or direct zero-shot inference, our method combines synthetic data generation with retrieval-based in-context learning, using a reasoning-optimized language model. This allows us to build a high-quality demonstration database without manual annotation and to dynamically retrieve relevant examples at inference time. Based on our approach we produce a synthetic dataset of over $5k$ Wikipedia abstracts with approximately $59k$ entities and $30k$ relation triples. Finally, we evaluate in-context learning performance on the DocIE shared task, extracting entities and relations from long documents in a zero-shot setting. We find that in-context joint entity and relation extraction at document-level remains a challenging task, even for state-of-the-art large language models.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.05997

Country:

Europe > Germany > Saxony > Dresden (0.40)
Europe > Austria > Vienna (0.17)
Asia > Thailand > Bangkok > Bangkok (0.04)
(5 more...)

Genre:

Research Report (0.82)
Personal > Honors (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

The Joint Entity-Relation Extraction Model Based on Span and Interactive Fusion Representation for Chinese Medical Texts with Complex Semantics

Feng, Danni, Li, Runzhi, Wang, Jing, Yan, Siyu, Ma, Lihong, Xing, Yunli

arXiv.org Artificial IntelligenceFeb-13-2025

Joint entity-relation extraction is a critical task in transforming unstructured or semi-structured text into triplets, facilitating the construction of large-scale knowledge graphs, and supporting various downstream applications. Despite its importance, research on Chinese text, particularly with complex semantics in specialized domains like medicine, remains limited. To address this gap, we introduce the CH-DDI, a Chinese drug-drug interactions dataset designed to capture the intricacies of medical text. Leveraging the strengths of attention mechanisms in capturing long-range dependencies, we propose the SEA module, which enhances the extraction of complex contextual semantic information, thereby improving entity recognition and relation extraction. Additionally, to address the inefficiencies of existing methods in facilitating information exchange between entity recognition and relation extraction, we present an interactive fusion representation module. This module employs Cross Attention for bidirectional information exchange between the tasks and further refines feature extraction through BiLSTM. Experimental results on both our CH-DDI dataset and public CoNLL04 dataset demonstrate that our model exhibits strong generalization capabilities. On the CH-DDI dataset, our model achieves an F1-score of 96.73% for entity recognition and 78.43% for relation extraction.

information retrieval, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.09247

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Henan Province > Zhengzhou (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

ADEQA: A Question Answer based approach for joint ADE-Suspect Extraction using Sequence-To-Sequence Transformers

Arannil, Vinayak, Deb, Tomal, Roy, Atanu

arXiv.org Artificial IntelligenceDec-19-2024

Early identification of Adverse Drug Events (ADE) is critical for taking prompt actions while introducing new drugs into the market. These ADEs information are available through various unstructured data sources like clinical study reports, patient health records, social media posts, etc. Extracting ADEs and the related suspect drugs using machine learning is a challenging task due to the complex linguistic relations between drug ADE pairs in textual data and unavailability of large corpus of labelled datasets. This paper introduces ADEQA, a question-answer(QA) based approach using quasi supervised labelled data and sequence-to-sequence transformers to extract ADEs, drug suspects and the relationships between them. Unlike traditional QA models, natural language generation (NLG) based models don't require extensive token level labelling and thereby reduces the adoption barrier significantly. On a public ADE corpus, we were able to achieve state-of-the-art results with an F1 score of 94% on establishing the relationships between ADEs and the respective suspects.

extraction, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.bionlp-1.17

2412.1551

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Baltimore (0.04)
North America > Dominican Republic (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.69)

Add feedback

Generalized knowledge-enhanced framework for biomedical entity and relation extraction

Nguyen, Minh, Le, Phuong

arXiv.org Artificial IntelligenceAug-13-2024

In recent years, there has been an increasing number of frameworks developed for biomedical entity and relation extraction. This research effort aims to address the accelerating growth in biomedical publications and the intricate nature of biomedical texts, which are written for mainly domain experts. To handle these challenges, we develop a novel framework that utilizes external knowledge to construct a task-independent and reusable background knowledge graph for biomedical entity and relation extraction. The design of our model is inspired by how humans learn domain-specific topics. In particular, humans often first acquire the most basic and common knowledge regarding a field to build the foundational knowledge and then use that as a basis for extending to various specialized topics. Our framework employs such common-knowledge-sharing mechanism to build a general neural-network knowledge graph that is learning transferable to different domain-specific biomedical texts effectively. Experimental evaluations demonstrate that our model, equipped with this generalized and cross-transferable knowledge base, achieves competitive performance benchmarks, including BioRelEx for binding interaction detection and ADE for Adverse Drug Effect identification.

entity and relation extraction, extraction, relation extraction, (14 more...)

arXiv.org Artificial Intelligence

2408.06618

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction

Zhou, Yang, Shan, Shimin, Wei, Hongkui, Zhao, Zhehuan, Feng, Wenshuo

arXiv.org Artificial IntelligenceMay-30-2024

Relation Extraction (RE) aims at recognizing the relation between pairs of entities mentioned in a text. Advances in LLMs have had a tremendous impact on NLP. In this work, we propose a textual data augmentation framework called PGA for improving the performance of models for RE in the scientific domain. The framework introduces two ways of data augmentation, utilizing a LLM to obtain pseudo-samples with the same sentence meaning but with different representations and forms by paraphrasing the original training set samples. As well as instructing LLM to generate sentences that implicitly contain information about the corresponding labels based on the relation and entity of the original training set samples. These two kinds of pseudo-samples participate in the training of the RE model together with the original dataset, respectively. The PGA framework in the experiment improves the F1 scores of the three mainstream models for RE within the scientific domain. Also, using a LLM to obtain samples can effectively reduce the cost of manually labeling data.

computational linguistic, extraction, information, (15 more...)

arXiv.org Artificial Intelligence

2405.20787

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.05)
North America > Canada > Ontario > Toronto (0.04)
(8 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction

Zaratiana, Urchade, Tomeh, Nadi, Khbir, Niama El, Holat, Pierre, Charnois, Thierry

arXiv.org Artificial IntelligenceApr-18-2024

Information extraction (IE) is an important task in Natural Language Processing (NLP), involving the extraction of named entities and their relationships from unstructured text. In this paper, we propose a novel approach to this task by formulating it as graph structure learning (GSL). By formulating IE as GSL, we enhance the model's ability to dynamically refine and optimize the graph structure during the extraction process. This formulation allows for better interaction and structure-informed decisions for entity and relation prediction, in contrast to previous models that have separate or untied predictions for these tasks. When compared against state-of-the-art baselines on joint entity and relation extraction benchmarks, our model, GraphER, achieves competitive results.

computational linguistic, extraction, relation extraction, (14 more...)

arXiv.org Artificial Intelligence

2404.12491

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > New York (0.04)
(15 more...)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

EnriCo: Enriched Representation and Globally Constrained Inference for Entity and Relation Extraction

Zaratiana, Urchade, Tomeh, Nadi, Dauxais, Yann, Holat, Pierre, Charnois, Thierry

arXiv.org Artificial IntelligenceApr-18-2024

Joint entity and relation extraction plays a pivotal role in various applications, notably in the construction of knowledge graphs. Despite recent progress, existing approaches often fall short in two key aspects: richness of representation and coherence in output structure. These models often rely on handcrafted heuristics for computing entity and relation representations, potentially leading to loss of crucial information. Furthermore, they disregard task and/or dataset-specific constraints, resulting in output structures that lack coherence. In our work, we introduce EnriCo, which mitigates these shortcomings. Firstly, to foster rich and expressive representation, our model leverage attention mechanisms that allow both entities and relations to dynamically determine the pertinent information required for accurate extraction. Secondly, we introduce a series of decoding algorithms designed to infer the highest scoring solutions while adhering to task and dataset-specific constraints, thus promoting structured and coherent outputs. Our model demonstrates competitive performance compared to baselines when evaluated on Joint IE datasets.

extraction, relation, representation, (15 more...)

arXiv.org Artificial Intelligence

2404.12493

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(8 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)

Add feedback

AutoRD: An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models

Cao, Lang, Sun, Jimeng, Cross, Adam

arXiv.org Artificial IntelligenceMar-1-2024

Objectives: Our objective is to create an end-to-end system called AutoRD, which automates extracting information from clinical text about rare diseases. We have conducted various tests to evaluate the performance of AutoRD and highlighted its strengths and limitations in this paper. Materials and Methods: Our system, AutoRD, is a software pipeline involving data preprocessing, entity extraction, relation extraction, entity calibration, and knowledge graph construction. We implement this using large language models and medical knowledge graphs developed from open-source medical ontologies. We quantitatively evaluate our system on entity extraction, relation extraction, and the performance of knowledge graph construction. Results: AutoRD achieves an overall F1 score of 47.3%, a 14.4% improvement compared to the base LLM. In detail, AutoRD achieves an overall entity extraction F1 score of 56.1% (rare_disease: 83.5%, disease: 35.8%, symptom_and_sign: 46.1%, anaphor: 67.5%) and an overall relation extraction F1 score of 38.6% (produces: 34.7%, increases_risk_of: 12.4%, is_a: 37.4%, is_acronym: 44.1%, is_synonym: 16.3%, anaphora: 57.5%). Our qualitative experiment also demonstrates that the performance in constructing the knowledge graph is commendable. Discussion: AutoRD demonstrates the potential of LLM applications in rare disease detection. This improvement is attributed to several design, including the integration of ontologies-enhanced LLMs. Conclusion: AutoRD is an automated end-to-end system for extracting rare disease information from text to build knowledge graphs. It uses ontologies-enhanced LLMs for a robust medical knowledge base. The superior performance of AutoRD is validated by experimental evaluations, demonstrating the potential of LLMs in healthcare.

autord, extraction, rare disease, (14 more...)

arXiv.org Artificial Intelligence

2403.00953

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
North America > United States > Illinois > Peoria County > Peoria (0.04)
North America > United States > District of Columbia > Washington (0.04)
Asia > Middle East > Bahrain (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

IPED: An Implicit Perspective for Relational Triple Extraction based on Diffusion Model

Zhao, Jianli, Xu, Changhao, Jiang, Bin

arXiv.org Artificial IntelligenceFeb-24-2024

Relational triple extraction is a fundamental task in the field of information extraction, and a promising framework based on table filling has recently gained attention as a potential baseline for entity relation extraction. However, inherent shortcomings such as redundant information and incomplete triple recognition remain problematic. To address these challenges, we propose an Implicit Perspective for relational triple Extraction based on Diffusion model (IPED), an innovative approach for extracting relational triples. Our classifier-free solution adopts an implicit strategy using block coverage to complete the tables, avoiding the limitations of explicit tagging methods. Additionally, we introduce a generative model structure, the block-denoising diffusion model, to collaborate with our implicit perspective and effectively circumvent redundant information disruptions. Experimental results on two popular datasets demonstrate that IPED achieves state-of-the-art performance while gaining superior inference speed and low computational complexity. To support future research, we have made our source code publicly available online.

computational linguistic, extraction, linguistic, (16 more...)

arXiv.org Artificial Intelligence

2403.00808

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Mexico > Mexico City > Mexico City (0.05)
North America > Canada > Ontario > Toronto (0.05)
(11 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback