AITopics | anonymisation

Collaborating Authors

anonymisation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example

Ghosh, Suhita, Jouaiti, Melanie, Das, Arnab, Sinha, Yamini, Polzehl, Tim, Siegert, Ingo, Stober, Sebastian

arXiv.org Artificial IntelligenceOct-20-2024

Speech anonymisation aims to protect speaker identity by changing personal identifiers in speech while retaining linguistic content. Current methods fail to retain prosody and unique speech patterns found in elderly and pathological speech domains, which is essential for remote health monitoring. To address this gap, we propose a voice conversion-based method (DDSP-QbE) using differentiable digital signal processing and query-by-example. The proposed method, trained with novel losses, aids in disentangling linguistic, prosodic, and domain representations, enabling the model to adapt to uncommon speech patterns. Objective and subjective evaluations show that DDSP-QbE significantly outperforms the voice conversion state-of-the-art concerning intelligibility, prosody, and domain preservation across diverse datasets, pathologies, and speakers while maintaining quality and speaker anonymity. Experts validate domain preservation by analysing twelve clinically pertinent domain attributes.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2024-328

2410.155

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Information Management > Search (0.70)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

Privacy-Preserving Synthetically Augmented Knowledge Graphs with Semantic Utility

Bellomarini, Luigi, Catalano, Costanza, Coletta, Andrea, Iezzi, Michela, Samarati, Pierangela

arXiv.org Artificial IntelligenceOct-16-2024

Knowledge Graphs (KGs) have recently gained relevant attention in many application domains, from healthcare to biotechnology, from logistics to finance. Financial organisations, central banks, economic research entities, and national supervision authorities apply ontological reasoning on KGs to address crucial business tasks, such as economic policymaking, banking supervision, anti-money laundering, and economic research. Reasoning allows for the generation of derived knowledge capturing complex business semantics and the set up of effective business processes. A major obstacle in KGs sharing is represented by privacy considerations since the identity of the data subjects and their sensitive or company-confidential information may be improperly exposed. In this paper, we propose a novel framework to enable KGs sharing while ensuring that information that should remain private is not directly released nor indirectly exposed via derived knowledge, while maintaining the embedded knowledge of the KGs to support business downstream tasks. Our approach produces a privacy-preserving synthetic KG as an augmentation of the input one via the introduction of structural anonymisation. We introduce a novel privacy measure for KGs, which considers derived knowledge and a new utility metric that captures the business semantics we want to preserve, and propose two novel anonymization algorithms. Our extensive experimental evaluation, with both synthetic graphs and real-world datasets, confirms the effectiveness of our approach achieving up to a 70% improvement in the privacy of entities compared to existing methods not specifically designed for KGs.

artificial intelligence, data mining, subgraph, (18 more...)

arXiv.org Artificial Intelligence

2410.12418

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > Nebraska > Lancaster County > Lincoln (0.04)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.61)
Information Technology > Data Science > Data Mining > Big Data (0.60)

Add feedback

Evaluating the Efficacy of AI Techniques in Textual Anonymization: A Comparative Study

Asimopoulos, Dimitris, Siniosoglou, Ilias, Argyriou, Vasileios, Goudos, Sotirios K., Psannis, Konstantinos E., Karditsioti, Nikoleta, Saoulidis, Theocharis, Sarigiannidis, Panagiotis

arXiv.org Artificial IntelligenceMay-9-2024

In the digital era, with escalating privacy concerns, it's imperative to devise robust strategies that protect private data while maintaining the intrinsic value of textual information. This research embarks on a comprehensive examination of text anonymisation methods, focusing on Conditional Random Fields (CRF), Long Short-Term Memory (LSTM), Embeddings from Language Models (ELMo), and the transformative capabilities of the Transformers architecture. Each model presents unique strengths since LSTM is modeling long-term dependencies, CRF captures dependencies among word sequences, ELMo delivers contextual word representations using deep bidirectional language models and Transformers introduce self-attention mechanisms that provide enhanced scalability. Our study is positioned as a comparative analysis of these models, emphasising their synergistic potential in addressing text anonymisation challenges. Preliminary results indicate that CRF, LSTM, and ELMo individually outperform traditional methods. The inclusion of Transformers, when compared alongside with the other models, offers a broader perspective on achieving optimal text anonymisation in contemporary settings.

anonymisation, information, sentence 1, (15 more...)

arXiv.org Artificial Intelligence

2405.06709

Country:

Europe > Greece > Central Macedonia > Thessaloniki (0.05)
Europe > North Macedonia (0.04)
Europe > Greece > West Macedonia > Kozani (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches

Asimopoulos, Dimitris, Siniosoglou, Ilias, Argyriou, Vasileios, Karamitsou, Thomai, Fountoukidis, Eleftherios, Goudos, Sotirios K., Moscholios, Ioannis D., Psannis, Konstantinos E., Sarigiannidis, Panagiotis

arXiv.org Artificial IntelligenceApr-22-2024

In the realm of data privacy, the ability to effectively anonymise text is paramount. With the proliferation of deep learning and, in particular, transformer architectures, there is a burgeoning interest in leveraging these advanced models for text anonymisation tasks. This paper presents a comprehensive benchmarking study comparing the performance of transformer-based models and Large Language Models(LLM) against traditional architectures for text anonymisation. Utilising the CoNLL-2003 dataset, known for its robustness and diversity, we evaluate several models. Our results showcase the strengths and weaknesses of each approach, offering a clear perspective on the efficacy of modern versus traditional methods. Notably, while modern models exhibit advanced capabilities in capturing con textual nuances, certain traditional architectures still keep high performance. This work aims to guide researchers in selecting the most suitable model for their anonymisation needs, while also shedding light on potential paths for future advancements in the field.

anonymisation, information, precision, (14 more...)

arXiv.org Artificial Intelligence

2404.14465

Country:

Europe > Greece > Central Macedonia > Thessaloniki (0.05)
Europe > North Macedonia (0.04)
Europe > Greece > West Macedonia > Kozani (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Vocoder drift compensation by x-vector alignment in speaker anonymisation

Panariello, Michele, Todisco, Massimiliano, Evans, Nicholas

arXiv.org Artificial IntelligenceJul-17-2023

For the most popular x-vector-based approaches to speaker anonymisation, the bulk of the anonymisation can stem from vocoding rather than from the core anonymisation function which is used to substitute an original speaker x-vector with that of a fictitious pseudo-speaker. This phenomenon can impede the design of better anonymisation systems since there is a lack of fine-grained control over the x-vector space. The work reported in this paper explores the origin of so-called vocoder drift and shows that it is due to the mismatch between the substituted x-vector and the original representations of the linguistic content, intonation and prosody. Also reported is an original approach to vocoder drift compensation. While anonymisation performance degrades as expected, compensation reduces vocoder drift substantially, offers improved control over the x-vector space and lays a foundation for the design of better anonymisation functions in the future.

artificial intelligence, machine learning, vocoder drift, (17 more...)

arXiv.org Artificial Intelligence

2307.08403

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > France (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Natural Language Processing for low-resource languages

AIHubJan-18-2023, 10:03:00 GMT

Clearly, such an imbalance is undesirable, putting those who do not use English at a disadvantage. In this article, we highlight some of the work and initiatives being carried out on low-resource languages. Africa is one of the most linguistically diverse regions in the world. Despite this, African languages are barely represented in technology and research. Lanfrica aims to mitigate the difficulty encountered in the discovery of African language resources by creating a centralised hub.

african language, artificial intelligence, natural language, (17 more...)

AIHub

Country: Africa (0.25)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Textwash -- automated open-source text anonymisation

Kleinberg, Bennett, Davies, Toby, Mozes, Maximilian

arXiv.org Artificial IntelligenceAug-27-2022

With the increasing digitisation of society and human communication, text data are becoming more important for research in the social and behavioural sciences (Gentzkow, Kelly, and Taddy 2019; Salganik 2019). Advances made in natural language processing (NLP) in particular have led to exciting insights derived from text data (e.g., on emotional responses to the pandemic (Kleinberg, Vegt, and Mozes 2020) or on the rhetoric around immigration in political speeches (Card et al. 2022); for an overview, see (Boyd and Schwartz 2021)). Importantly, the use of computational techniques to quantify and analyse text data has triggered a demand, especially for large datasets (often of several tens of thousands of documents) that can be harnessed for machine learning approaches (e.g., (Socher et al. 2013; Lewis et al. 2020)). That status quo of a need for larger datasets and an appetite to use text data for the study of social science phenomena has resulted in a dilemma: many of the important questions require targeted, primary data collection or access to potentially sensitive data. However, such data are hard to obtain, not because they do not exist but because sharing them is constrained by data protection regulations and ethical concerns. One potential consequence is that research activity may be biased toward topics for which suitable data is more readily available rather than those most important. One of the few viable solutions to this dilemma is automated text anonymisation; that is, the large-scale processing of text data so that individuals cannot be identified from the resulting output. Such a method would allow for the flow of sensitive data so that the staggering potential of text data can be exploited for scientific progress. With this paper and the tool it introduces, we seek to enable researchers to work with such sensitive data in a way that protects the privacy of individuals whilst retaining the usefulness of anonymised data for computational text analysis.

information, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2208.13081

Country:

Europe > Netherlands (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(10 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)
Instructional Material > Course Syllabus & Notes (0.64)
Instructional Material > Online (0.50)

Industry:

Media > Music (1.00)
Media > Film (1.00)
Information Technology > Security & Privacy (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The MAPA toolkit: sharing your data privately

#artificialintelligenceOct-25-2021, 08:55:23 GMT

Think of all the data sources which include your personal information within the public administration services; be it bank account details, financial or medical records, tax information, etc. We often take it for granted that our data is safe and protected. However, what happens when this information is shared among different public administration entities? In reality, the General Data Protection Regulation (GDPR) laws safeguard the general public by limiting what data can be shared among entities, requiring that the data be anonymised before it is shared among different entities, including those within the public administration. The Multilingual Anonymisation for Public Administration (MAPA) Project is a European-funded project which is developing an open-source toolkit that enables effective and reliable text anonymisation, focusing on the medical and legal domains.

information, public administration, toolkit, (13 more...)

#artificialintelligence

Country: Europe > Middle East > Malta (0.07)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

The Difficulty of Graph Anonymisation - KDnuggets

#artificialintelligenceFeb-25-2021, 16:45:27 GMT

This article is written in response to the recent TraceTogether privacy saga. For the non-Singaporeans out there, TraceTogether is Singapore's contact tracing initiative in response to the COVID-19 pandemic in Singapore. The objective of the program was to quickly identify people who might be in close contact with anyone who has tested positive for the virus. It comprises of an app or physical token which uses Bluetooth signals to store proximity records. As at the end December 2020, 70% of Singapore residents were supposedly on the programme.

dataset, information, tracetogether data, (16 more...)

#artificialintelligence

Country:

Asia > Singapore (0.65)
North America > United States > Massachusetts (0.05)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)
Health & Medicine > Epidemiology (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Ethical aspects of Artificial Intelligence, part 2/2: Differential privacy - Datascience.aero

#artificialintelligenceJul-30-2020, 11:50:05 GMT

As the second installment in this series of posts, I will touch upon on the topic of privacy in data science and algorithms. In particular, I'm going to discuss a relatively novel concept of privacy called differential privacy that promises, similar to algorithmic fairness, a way of quantifying the privacy of AI algorithms. When we, as humans, talk about privacy, we mostly refer to a desire to not be observed by others. However, what does privacy mean in the context of algorithms that "observe" us by using data that has information on us? In a very general sense, we could say that privacy will be preserved if, after analysis, the algorithm that used our data (e.g. an application on our smartphones) doesn't know anything about us.

artificial intelligence, differential privacy, privacy, (15 more...)

#artificialintelligence

Genre: Research Report (0.36)

Industry:

Information Technology > Security & Privacy (0.36)
Media > Film (0.34)
Leisure & Entertainment (0.34)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback