AITopics | named-entity recognition

Collaborating Authors

named-entity recognition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data Augmentation for Maltese NLP using Transliterated and Machine Translated Arabic Data

Micallef, Kurt, Habash, Nizar, Borg, Claudia

arXiv.org Artificial IntelligenceNov-13-2025

Maltese is a unique Semitic language that has evolved under extensive influence from Romance and Germanic languages, particularly Italian and English. Despite its Semitic roots, its orthography is based on the Latin script, creating a gap between it and its closest linguistic relatives in Arabic. In this paper, we explore whether Arabic-language resources can support Maltese natural language processing (NLP) through cross-lingual augmentation techniques. We investigate multiple strategies for aligning Arabic textual data with Maltese, including various transliteration schemes and machine translation (MT) approaches. As part of this, we also introduce novel transliteration systems that better represent Maltese orthography. We evaluate the impact of these augmentations on monolingual and mutlilingual models and demonstrate that Arabic-based augmentation can significantly benefit Maltese NLP tasks.

artificial intelligence, computational linguistic, natural language, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.findings-emnlp.1177

2509.12853

Country:

North America > Canada (0.28)
Europe > France (0.28)
North America > United States > Minnesota (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

PBa-LLM: Privacy- and Bias-aware NLP using Named-Entity Recognition (NER)

Mancera, Gonzalo, Morales, Aythami, Fierrez, Julian, Tolosana, Ruben, Penna, Alejandro, Lopez-Duran, Miguel, Jurado, Francisco, Ortigosa, Alvaro

arXiv.org Artificial IntelligenceJul-10-2025

The use of Natural Language Processing (NLP) in high-stakes AI-based applications has increased significantly in recent years, especially since the emergence of Large Language Models (LLMs). However, despite their strong performance, LLMs introduce important legal/ethical concerns, particularly regarding privacy, data protection, and transparency. Due to these concerns, this work explores the use of Named-Entity Recognition (NER) to facilitate the privacy-preserving training (or adaptation) of LLMs. We propose a framework that uses NER technologies to anonymize sensitive information in text data, such as personal identities or geographic locations. An evaluation of the proposed privacy-preserving learning framework was conducted to measure its impact on user privacy and system performance in a particular high-stakes and sensitive setup: AI-based resume scoring for recruitment processes. The study involved two language models (BERT and RoBERTa) and six anonymization algorithms (based on Presidio, FLAIR, BERT, and different versions of GPT) applied to a database of 24,000 candidate profiles. The findings indicate that the proposed privacy preservation techniques effectively maintain system performance while playing a critical role in safeguarding candidate confidentiality, thus promoting trust in the experimented scenario. On top of the proposed privacy-preserving approach, we also experiment applying an existing approach that reduces the gender bias in LLMs, thus finally obtaining our proposed Privacy-and Bias-aware LLMs (PBa-LLMs). Note that the proposed PBa-LLMs have been evaluated in a particular setup (resume scoring), but are generally applicable to any other LLM-based AI application.

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.02966

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLMs as Data Annotators: How Close Are We to Human Performance

Haq, Muhammad Uzair Ul, Rigoni, Davide, Sperduti, Alessandro

arXiv.org Artificial IntelligenceApr-22-2025

In NLP, fine-tuning LLMs is effective for various applications but requires high-quality annotated data. However, manual annotation of data is labor-intensive, time-consuming, and costly. Therefore, LLMs are increasingly used to automate the process, often employing in-context learning (ICL) in which some examples related to the task are given in the prompt for better performance. However, manually selecting context examples can lead to inefficiencies and suboptimal model performance. This paper presents comprehensive experiments comparing several LLMs, considering different embedding models, across various datasets for the Named Entity Recognition (NER) task. The evaluation encompasses models with approximately $7$B and $70$B parameters, including both proprietary and non-proprietary models. Furthermore, leveraging the success of Retrieval-Augmented Generation (RAG), it also considers a method that addresses the limitations of ICL by automatically retrieving contextual examples, thereby enhancing performance. The results highlight the importance of selecting the appropriate LLM and embedding model, understanding the trade-offs between LLM sizes and desired performance, and the necessity to direct research efforts towards more challenging datasets.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2504.15022

Country:

Europe (0.67)
North America > United States (0.28)
North America > Mexico (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Fine-tuning Transformer-based Encoder for Turkish Language Understanding Tasks

Yildirim, Savas

arXiv.org Artificial IntelligenceJan-30-2024

Deep learning-based and lately Transformer-based language models have been dominating the studies of natural language processing in the last years. Thanks to their accurate and fast fine-tuning characteristics, they have outperformed traditional machine learning-based approaches and achieved state-of-the-art results for many challenging natural language understanding (NLU) problems. Recent studies showed that the Transformer-based models such as BERT, which is Bidirectional Encoder Representations from Transformers, have reached impressive achievements on many tasks. Moreover, thanks to their transfer learning capacity, these architectures allow us to transfer pre-built models and fine-tune them to specific NLU tasks such as question answering. In this study, we provide a Transformer-based model and a baseline benchmark for the Turkish Language. We successfully fine-tuned a Turkish BERT model, namely BERTurk that is trained with base settings, to many downstream tasks and evaluated with a the Turkish Benchmark dataset. We showed that our studies significantly outperformed other existing baseline approaches for Named-Entity Recognition, Sentiment Analysis, Question Answering and Text Classification in Turkish Language. We publicly released these four fine-tuned models and resources in reproducibility and with the view of supporting other Turkish researchers and applications.

dataset, fine-tuning transformer-based encoder, publication date, (13 more...)

arXiv.org Artificial Intelligence

2401.17396

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
(10 more...)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CLaCLab at SocialDisNER: Using Medical Gazetteers for Named-Entity Recognition of Disease Mentions in Spanish Tweets

Verma, Harsh, Bagherzadeh, Parsa, Bergler, Sabine

arXiv.org Artificial IntelligenceSep-12-2022

The simplicity of this pipeline This paper summarizes the CLaC submission and its knowledge injection from readily available for SMM4H 2022 Task 10 which domain resources rather than training purely from concerns the recognition of diseases mentioned training data make our system's strength.

disease name, gazetteer, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2209.03528

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

GitHub - chakki-works/seqeval: A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)

#artificialintelligenceAug-9-2022, 17:55:49 GMT

This is well-tested by using the Perl script conlleval, which can be used for measuring the performance of a system that has processed the CoNLL-2000 shared task data. The default mode is compatible with conlleval. If you want to use the default mode, you don't need to specify it: In strict mode, the inputs are evaluated according to the specified schema. The behavior of the strict mode is different from the default one which is designed to simulate conlleval. If you want to use the strict mode, please specify mode'strict' and scheme arguments at the same time:

named-entity recognition, python framework, seqeval, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Zero-Shot Learning in Named-Entity Recognition with External Knowledge

Van Hoang, Nguyen, Mulvad, Soeren Hougaard, Rong, Dexter Neo Yuan, Yue, Yang

arXiv.org Artificial IntelligenceNov-15-2021

A significant shortcoming of current state-of-the-art (SOTA) named-entity recognition (NER) systems is their lack of generalization to unseen domains, which poses a major problem since obtaining labeled data for NER in a new domain is expensive and time-consuming. We propose ZERO, a model that performs zero-shot and few-shot learning in NER to generalize to unseen domains by incorporating pre-existing knowledge in the form of semantic word embeddings. ZERO first obtains contextualized word representations of input sentences using the model LUKE, reduces their dimensionality, and compares them directly with the embeddings of the external knowledge, allowing ZERO to be trained to recognize unseen output entities. We find that ZERO performs well on unseen NER domains with an average macro F1 score of 0.23, outperforms LUKE in few-shot learning, and even achieves competitive scores on an in-domain comparison. The performance across source-target domain pairs is shown to be inversely correlated with the pairs' KL divergence.

knowledge, learning, representation, (12 more...)

arXiv.org Artificial Intelligence

2111.07734

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.15)
Asia > Singapore (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Fine-Tuning Transformers for NLP

#artificialintelligenceJun-23-2021, 12:40:23 GMT

You can see a complete working example in our Colab Notebook, and you can play with the trained models on HuggingFace. Since being first developed and released in the Attention Is All You Need paper Transformers have completely redefined the field of Natural Language Processing (NLP) setting the state-of-the-art on numerous tasks such as question answering, language generation, and named-entity recognition. Here we won't go into too much detail about what a Transformer is, but rather how to apply and train them to help achieve some task at hand. The main things to keep in mind conceptually about Transformers are that they are really good at dealing with sequential data (text, speech, etc.), they act as an encoder-decoder framework where data is mapped to some representational space by the encoder before then being mapped to the output by way of the decoder, and they scale incredibly well to parallel processing hardware (GPUs). Transformers in the field of Natural Language Processing have been trained on massive amounts of text data which allow them to understand both the syntax and semantics of a language very well.

architecture, dataset, transformer, (14 more...)

#artificialintelligence

Country: North America > United States > California > Santa Clara County > Palo Alto (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Transfer Learning for Named-Entity Recognition with Neural Networks

Lee, Ji Young, Dernoncourt, Franck, Szolovits, Peter

arXiv.org Machine LearningMay-17-2017

Recent approaches based on artificial neural networks (ANNs) have shown promising results for named-entity recognition (NER). In order to achieve high performances, ANNs need to be trained on a large labeled dataset. However, labels might be difficult to obtain for the dataset on which the user wants to perform NER: label scarcity is particularly pronounced for patient note de-identification, which is an instance of NER. In this work, we analyze to what extent transfer learning may address this issue. In particular, we demonstrate that transferring an ANN model trained on a large labeled dataset to another dataset with a limited number of labels improves upon the state-of-the-art results on two different datasets for patient note de-identification.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

1705.06273

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York (0.04)
Asia (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.49)
Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.79)

Add feedback

NeuroNER: an easy-to-use program for named-entity recognition based on neural networks

Dernoncourt, Franck, Lee, Ji Young, Szolovits, Peter

arXiv.org Machine LearningMay-15-2017

Named-entity recognition (NER) aims at identifying entities of interest in a text. Artificial neural networks (ANNs) have recently been shown to outperform existing NER systems. However, ANNs remain challenging to use for non-expert users. In this paper, we present NeuroNER, an easyto-use named-entity recognition tool based on ANNs. Users can annotate entities using a graphical web-based user interface (BRAT): the annotations are then used to train an ANN, which in turn predict entities' locations and categories in new texts. NeuroNER makes this annotationtraining-prediction flow smooth and accessible to anyone.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1705.05487

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback