AITopics | anonymization technique

Collaborating Authors

anonymization technique

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fair Play for Individuals, Foul Play for Groups? Auditing Anonymization's Impact on ML Fairness

Arcolezi, Héber H., Alishahi, Mina, Bendoukha, Adda-Akram, Kaaniche, Nesrine

arXiv.org Artificial IntelligenceNov-3-2025

Machine learning (ML) algorithms are heavily based on the availability of training data, which, depending on the domain, often includes sensitive information about data providers. This raises critical privacy concerns. Anonymization techniques have emerged as a practical solution to address these issues by generalizing features or suppressing data to make it more difficult to accurately identify individuals. Although recent studies have shown that privacy-enhancing technologies can influence ML predictions across different subgroups, thus affecting fair decision-making, the specific effects of anonymization techniques, such as $k$-anonymity, $\ell$-diversity, and $t$-closeness, on ML fairness remain largely unexplored. In this work, we systematically audit the impact of anonymization techniques on ML fairness, evaluating both individual and group fairness. Our quantitative study reveals that anonymization can degrade group fairness metrics by up to fourfold. Conversely, similarity-based individual fairness metrics tend to improve under stronger anonymization, largely as a result of increased input homogeneity. By analyzing varying levels of anonymization across diverse privacy settings and data distributions, this study provides critical insights into the trade-offs between privacy, fairness, and utility, offering actionable guidelines for responsible AI development. Our code is publicly available at: https://github.com/hharcolezi/anonymity-impact-fairness.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/faia250909

2505.07985

Country:

North America > United States (0.93)
Europe (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Target speaker anonymization in multi-speaker recordings

Tomashenko, Natalia, Yamagishi, Junichi, Wang, Xin, Liu, Yun, Vincent, Emmanuel

arXiv.org Artificial IntelligenceOct-13-2025

Most of the existing speaker anonymization research has focused on single-speaker audio, leading to the development of techniques and evaluation metrics optimized for such condition. This study addresses the significant challenge of speaker anonymization within multi-speaker conversational audio, specifically when only a single target speaker needs to be anonymized. This scenario is highly relevant in contexts like call centers, where customer privacy necessitates anonymizing only the customer's voice in interactions with operators. Conventional anonymization methods are often not suitable for this task. Moreover, current evaluation methodology does not allow us to accurately assess privacy protection and utility in this complex multi-speaker scenario. This work aims to bridge these gaps by exploring effective strategies for targeted speaker anonymization in conversational audio, highlighting potential problems in their development and proposing corresponding improved evaluation methodologies.

artificial intelligence, machine learning, target speaker, (15 more...)

arXiv.org Artificial Intelligence

2510.09307

Country: Asia > Japan (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Impact of Event Data Partitioning on Privacy-aware Process Discovery

Lim, Jungeun, Fahrenkrog-Petersen, Stephan A., Lu, Xixi, Mendling, Jan, Song, Minseok

arXiv.org Artificial IntelligenceJul-9-2025

Information systems support the execution of business processes. The event logs of these executions generally contain sensitive information about customers, patients, and employees. The corresponding privacy challenges can be addressed by anonymizing the event logs while still retaining utility for process discovery. However, trading off utility and privacy is difficult: the higher the complexity of event log, the higher the loss of utility by anonymization. In this work, we propose a pipeline that combines anonymization and event data partitioning, where event abstraction is utilized for partitioning. By leveraging event abstraction, event logs can be segmented into multiple parts, allowing each sub-log to be anonymized separately. This pipeline preserves privacy while mitigating the loss of utility. To validate our approach, we study the impact of event partitioning on two anonymization techniques using three real-world event logs and two process discovery techniques. Our results demonstrate that event partitioning can bring improvements in process discovery utility for directly-follows-based anonymization techniques.

anonymization, artificial intelligence, event log, (14 more...)

arXiv.org Artificial Intelligence

2507.06008

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)

Add feedback

Children's Voice Privacy: First Steps And Emerging Challenges

Kulkarni, Ajinkya, Teixeira, Francisco, Hermann, Enno, Rolland, Thomas, Trancoso, Isabel, Doss, Mathew Magimai

arXiv.org Artificial IntelligenceJun-5-2025

Children are one of the most under-represented groups in speech technologies, as well as one of the most vulnerable in terms of privacy. Despite this, anonymization techniques targeting this population have received little attention. In this study, we seek to bridge this gap, and establish a baseline for the use of voice anonymization techniques designed for adult speech when applied to children's voices. Such an evaluation is essential, as children's speech presents a distinct set of challenges when compared to that of adults. This study comprises three children's datasets, six anonymization methods, and objective and subjective utility metrics for evaluation. Our results show that existing systems for adults are still able to protect children's voice privacy, but suffer from much higher utility degradation. In addition, our subjective study displays the challenges of automatic evaluation methods for speech quality in children's speech, highlighting the need for further research.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.001

Country: Europe > Portugal (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Security & Privacy (0.94)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

Forecasting Anonymized Electricity Load Profiles

Fernandez, Joaquin Delgado, Menci, Sergio Potenciano, Magitteri, Alessio

arXiv.org Artificial IntelligenceJan-8-2025

In the evolving landscape of data privacy, the anonymization of electric load profiles has become a critical issue, especially with the enforcement of the General Data Protection Regulation (GDPR) in Europe. These electric load profiles, which are essential datasets in the energy industry, are classified as personal behavioral data, necessitating stringent protective measures. This article explores the implications of this classification, the importance of data anonymization, and the potential of forecasting using microaggregated data. The findings underscore that effective anonymization techniques, such as microaggregation, do not compromise the performance of forecasting models under certain conditions (i.e., forecasting aggregated). In such an aggregated level, microaggregated data maintains high levels of utility, with minimal impact on forecasting accuracy. The implications for the energy sector are profound, suggesting that privacy-preserving data practices can be integrated into smart metering technology applications without hindering their effectiveness. In early 2009, the European Commission, through Directive 2009/72/EC, mandated the widespread deployment of smart meters across Europe [1].

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.06237

Country:

North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
Europe > Serbia > Central Serbia > Belgrade (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (1.00)
Government > Regional Government > Europe Government (0.34)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

NTU-NPU System for Voice Privacy 2024 Challenge

Kuzmin, Nikita, Luong, Hieu-Thi, Yao, Jixun, Xie, Lei, Lee, Kong Aik, Chng, Eng Siong

arXiv.org Artificial IntelligenceOct-3-2024

B3 The baseline system B3 uses a Wasserstein generative adversarial In this work, we describe our submissions for the Voice Privacy network with Quadratic Transport Cost (WGAN-QC) [6] Challenge 2024. Rather than proposing a novel speech to generate artificial pseudo-speaker embeddings, anonymizing anonymization system, we enhance the provided baselines to the speaker's identity through four main steps: meet all required conditions and improve evaluated metrics. Specifically, we implement emotion embedding and experiment 1. Phonetic Transcriptions Extraction: Phonetic transcriptions with WavLM and ECAPA2 speaker embedders for the B3 baseline.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/SPSC.2024-13

2410.02371

Country:

Asia > Singapore (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)

Add feedback

Synthetic Data: Revisiting the Privacy-Utility Trade-off

Sarmin, Fatima Jahan, Sarkar, Atiquer Rahman, Wang, Yang, Mohammed, Noman

arXiv.org Artificial IntelligenceJul-9-2024

Synthetic data has been considered a better privacy-preserving alternative to traditionally sanitized data across various applications. However, a recent article challenges this notion, stating that synthetic data does not provide a better trade-off between privacy and utility than traditional anonymization techniques, and that it leads to unpredictable utility loss and highly unpredictable privacy gain. The article also claims to have identified a breach in the differential privacy guarantees provided by PATEGAN and PrivBayes. When a study claims to refute or invalidate prior findings, it is crucial to verify and validate the study. In our work, we analyzed the implementation of the privacy game described in the article and found that it operated in a highly specialized and constrained environment, which limits the applicability of its findings to general cases. Our exploration also revealed that the game did not satisfy a crucial precondition concerning data distributions, which contributed to the perceived violation of the differential privacy guarantees offered by PATEGAN and PrivBayes. We also conducted a privacy-utility trade-off analysis in a more general and unconstrained environment. Our experimentation demonstrated that synthetic data achieves a more favorable privacy-utility trade-off compared to the provided implementation of k-anonymization, thereby reaffirming earlier conclusions.

attacker, dataset, outlier, (17 more...)

arXiv.org Artificial Intelligence

2407.07926

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Texas (0.04)
(11 more...)

Genre:

Research Report > Experimental Study (0.48)
Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Hide and Seek (HaS): A Lightweight Framework for Prompt Privacy Protection

Chen, Yu, Li, Tingxin, Liu, Huiming, Yu, Yang

arXiv.org Artificial IntelligenceSep-6-2023

Numerous companies have started offering services based on large language models (LLM), such as ChatGPT, which inevitably raises privacy concerns as users' prompts are exposed to the model provider. Previous research on secure reasoning using multi-party computation (MPC) has proven to be impractical for LLM applications due to its time-consuming and communication-intensive nature. While lightweight anonymization techniques can protect private information in prompts through substitution or masking, they fail to recover sensitive data replaced in the LLM-generated results. In this paper, we expand the application scenarios of anonymization techniques by training a small local model to de-anonymize the LLM's returned results with minimal computational overhead. We introduce the HaS framework, where "H(ide)" and "S(eek)" represent its two core processes: hiding private entities for anonymization and seeking private entities for de-anonymization, respectively. To quantitatively assess HaS's privacy protection performance, we propose both black-box and white-box adversarial models. Furthermore, we conduct experiments to evaluate HaS's usability in translation and classification tasks. The experimental findings demonstrate that the HaS framework achieves an optimal balance between privacy protection and utility.

computational overhead, llm, privacy, (14 more...)

arXiv.org Artificial Intelligence

2309.03057

Country:

North America > United States > District of Columbia > Washington (0.14)
North America > United States > New York (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Balancing Privacy and Progress in Artificial Intelligence: Anonymization in Histopathology for Biomedical Research and Education

Kanwal, Neel, Janssen, Emiel A. M., Engan, Kjersti

arXiv.org Artificial IntelligenceAug-8-2023

The advancement of biomedical research heavily relies on access to large amounts of medical data. In the case of histopathology, Whole Slide Images (WSI) and clinicopathological information are valuable for developing Artificial Intelligence (AI) algorithms for Digital Pathology (DP). Transferring medical data "as open as possible" enhances the usability of the data for secondary purposes but poses a risk to patient privacy. At the same time, existing regulations push towards keeping medical data "as closed as necessary" to avoid re-identification risks. Generally, these legal regulations require the removal of sensitive data but do not consider the possibility of data linkage attacks due to modern image-matching algorithms. In addition, the lack of standardization in DP makes it harder to establish a single solution for all formats of WSIs. These challenges raise problems for bio-informatics researchers in balancing privacy and progress while developing AI algorithms. This paper explores the legal regulations and terminologies for medical data-sharing. We review existing approaches and highlight challenges from the histopathological perspective. We also present a data-sharing guideline for histological data to foster multidisciplinary research and education.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2307.09426

Country:

Europe > Norway > Western Norway > Rogaland > Stavanger (0.05)
North America > United States > Washington (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Law > Statutes (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Providers & Services (0.70)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

Long-term Conversation Analysis: Exploring Utility and Privacy

Nespoli, Francesco, Pohlhausen, Jule, Naylor, Patrick A., Bitzer, Joerg

arXiv.org Artificial IntelligenceJun-28-2023

The analysis of conversations recorded in everyday life requires privacy protection. In this contribution, we explore a privacy-preserving feature extraction method based on input feature dimension reduction, spectral smoothing and the low-cost speaker anonymization technique based on McAdams coefficient. We assess the utility of the feature extraction methods with a voice activity detection and a speaker diarization system, while privacy protection is determined with a speech recognition and a speaker verification model. We show that the combination of McAdams coefficient and spectral smoothing maintains the utility while improving privacy.

artificial intelligence, machine learning, speech, (17 more...)

arXiv.org Artificial Intelligence

2306.16071

Country:

Europe > Germany > Lower Saxony > Oldenburg (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback