AITopics | pseudonymization

Collaborating Authors

pseudonymization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Semantically-Aware LLM Agent to Enhance Privacy in Conversational AI Services

Serenari, Jayden, Lee, Stephen

arXiv.org Artificial IntelligenceNov-3-2025

With the increasing use of conversational AI systems, there is growing concern over privacy leaks, especially when users share sensitive personal data in interactions with Large Language Models (LLMs). Conversations shared with these models may contain Personally Identifiable Information (PII), which, if exposed, could lead to security breaches or identity theft. To address this challenge, we present the Local Optimizations for Pseudonymization with Semantic Integrity Directed Entity Detection (LOPSIDED) framework, a semantically-aware privacy agent designed to safeguard sensitive PII data when using remote LLMs. Unlike prior work that often degrade response quality, our approach dynamically replaces sensitive PII entities in user prompts with semantically consistent pseudonyms, preserving the contextual integrity of conversations. Once the model generates its response, the pseudonyms are automatically depseudonymized, ensuring the user receives an accurate, privacy-preserving output. We evaluate our approach using real-world conversations sourced from ShareGPT, which we further augment and annotate to assess whether named entities are contextually relevant to the model's response. Our results show that LOPSIDED reduces semantic utility errors by a factor of 5 compared to baseline techniques, all while enhancing privacy.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.27016

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Current State in Privacy-Preserving Text Preprocessing for Domain-Agnostic NLP

Sinha, Abhirup, Saha, Pritilata, Saha, Tithi

arXiv.org Artificial IntelligenceAug-6-2025

Privacy is a fundamental human right. Data privacy is protected by different regulations, such as GDPR. However, modern large language models require a huge amount of data to learn linguistic variations, and the data often contains private information. Research has shown that it is possible to extract private information from such language models. Thus, anonymizing such private and sensitive information is of utmost importance. While complete anonymization may not be possible, a number of different pre-processing approaches exist for masking or pseudonymizing private information in textual data. This report focuses on a few of such approaches for domain-agnostic NLP tasks.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2508.03204

Country:

North America > United States > California (0.04)
Europe > Germany (0.04)
Asia > India > Tamil Nadu > Vellore (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.86)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)

Add feedback

Survey of Pseudonymization, Abstractive Summarization & Spell Checker for Hindi and Marathi

Ransing, Rasika, Dhamaskar, Mohammed Amaan, Rajpurohit, Ayush, Dhoke, Amey, Dalvi, Sanket

arXiv.org Artificial IntelligenceDec-23-2024

India's vast linguistic diversity presents unique challenges and opportunities for technological advancement, especially in the realm of Natural Language Processing (NLP). While there has been significant progress in NLP applications for widely spoken languages, the regional languages of India, such as Marathi and Hindi, remain underserved. Research in the field of NLP for Indian regional languages is at a formative stage and holds immense significance. The paper aims to build a platform which enables the user to use various features like text anonymization, abstractive text summarization and spell checking in English, Hindi and Marathi language. The aim of these tools is to serve enterprise and consumer clients who predominantly use Indian Regional Languages.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.18163

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > Canada > Ontario > Toronto (0.05)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(4 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cloaked Classifiers: Pseudonymization Strategies on Sensitive Classification Tasks

Riabi, Arij, Mahamdi, Menel, Mouilleron, Virginie, Seddah, Djamé

arXiv.org Artificial IntelligenceJun-25-2024

Protecting privacy is essential when sharing data, particularly in the case of an online radicalization dataset that may contain personal information. In this paper, we explore the balance between preserving data usefulness and ensuring robust privacy safeguards, since regulations like the European GDPR shape how personal information must be handled. We share our method for manually pseudonymizing a multilingual radicalization dataset, ensuring performance comparable to the original data. Furthermore, we highlight the importance of establishing comprehensive guidelines for processing sensitive NLP data by sharing our complete pseudonymization process, our guidelines, the challenges we encountered as well as the resulting dataset.

computational linguistic, dataset, information, (15 more...)

arXiv.org Artificial Intelligence

2406.17875

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
North America > Montserrat (0.04)
Europe > Faroe Islands > Streymoy > Tórshavn (0.04)
(15 more...)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
(2 more...)

Add feedback

Grandma Karl is 27 years old -- research agenda for pseudonymization of research data

Volodina, Elena, Dobnik, Simon, Tiedemann, Therese Lindström, Vu, Xuan-Son

arXiv.org Artificial IntelligenceAug-30-2023

Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names or political opinions. General Data Protection Regulation (GDPR) suggests pseudonymization as a solution to secure open access to research data, but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data. This paper outlines a research agenda within pseudonymization, namely need of studies into the effects of pseudonymization on unstructured data in relation to e.g. readability and language assessment, as well as the effectiveness of pseudonymization as a way of protecting writer identity, while also exploring different ways of developing context-sensitive algorithms for detection, labelling and replacement of personal information in unstructured data. The recently granted project on pseudonymization Grandma Karl is 27 years old addresses exactly those challenges.

information, pseudonymization, research data, (15 more...)

arXiv.org Artificial Intelligence

2308.16109

Country:

Europe > Poland (0.05)
Europe > Sweden > Östergötland County > Linköping (0.05)
Europe > Sweden > Stockholm > Stockholm (0.05)
(8 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Balancing Privacy and Progress in Artificial Intelligence: Anonymization in Histopathology for Biomedical Research and Education

Kanwal, Neel, Janssen, Emiel A. M., Engan, Kjersti

arXiv.org Artificial IntelligenceAug-8-2023

The advancement of biomedical research heavily relies on access to large amounts of medical data. In the case of histopathology, Whole Slide Images (WSI) and clinicopathological information are valuable for developing Artificial Intelligence (AI) algorithms for Digital Pathology (DP). Transferring medical data "as open as possible" enhances the usability of the data for secondary purposes but poses a risk to patient privacy. At the same time, existing regulations push towards keeping medical data "as closed as necessary" to avoid re-identification risks. Generally, these legal regulations require the removal of sensitive data but do not consider the possibility of data linkage attacks due to modern image-matching algorithms. In addition, the lack of standardization in DP makes it harder to establish a single solution for all formats of WSIs. These challenges raise problems for bio-informatics researchers in balancing privacy and progress while developing AI algorithms. This paper explores the legal regulations and terminologies for medical data-sharing. We review existing approaches and highlight challenges from the histopathological perspective. We also present a data-sharing guideline for histological data to foster multidisciplinary research and education.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2307.09426

Country:

Europe > Norway > Western Norway > Rogaland > Stavanger (0.05)
North America > United States > Washington (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Law > Statutes (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Providers & Services (0.70)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

Privacy- and Utility-Preserving NLP with Anonymized Data: A case study of Pseudonymization

Yermilov, Oleksandr, Raheja, Vipul, Chernodub, Artem

arXiv.org Artificial IntelligenceJun-8-2023

This work investigates the effectiveness of different pseudonymization techniques, ranging from rule-based substitutions to using pre-trained Large Language Models (LLMs), on a variety of datasets and models used for two widely used NLP tasks: text classification and summarization. Our work provides crucial insights into the gaps between original and anonymized data (focusing on the pseudonymization technique) and model quality and fosters future research into higher-quality anonymization techniques to better balance the trade-offs between data protection and utility preservation. We make our code, pseudonymized datasets, and downstream models publicly available

computational linguistic, movie, pseudonymization, (15 more...)

arXiv.org Artificial Intelligence

2306.05561

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia (0.14)
North America > Montserrat (0.05)
(9 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment > Sports > Soccer (0.69)
Health & Medicine > Health Care Technology > Medical Record (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback