AITopics | Papadopoulou, Anthi

Collaborating Authors

Papadopoulou, Anthi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural Text Sanitization with Privacy Risk Indicators: An Empirical Analysis

Papadopoulou, Anthi, Lison, Pierre, Anderson, Mark, Øvrelid, Lilja, Pilán, Ildikó

arXiv.org Artificial IntelligenceOct-22-2023

Text sanitization is the task of redacting a document to mask all occurrences of (direct or indirect) personal identifiers, with the goal of concealing the identity of the individual(s) referred in it. In this paper, we consider a two-step approach to text sanitization and provide a detailed analysis of its empirical performance on two recently published datasets: the Text Anonymization Benchmark (Pil\'an et al., 2022) and a collection of Wikipedia biographies (Papadopoulou et al., 2022). The text sanitization process starts with a privacy-oriented entity recognizer that seeks to determine the text spans expressing identifiable personal information. This privacy-oriented entity recognizer is trained by combining a standard named entity recognition model with a gazetteer populated by person-related terms extracted from Wikidata. The second step of the text sanitization process consists in assessing the privacy risk associated with each detected text span, either isolated or in combination with other text spans. We present five distinct indicators of the re-identification risk, respectively based on language model probabilities, text span classification, sequence labelling, perturbations, and web search. We provide a contrastive analysis of each privacy indicator and highlight their benefits and limitations, notably in relation to the available labeled data.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.14312

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.28)
(2 more...)

Genre:

Research Report (1.00)
Personal > Honors (0.46)

Industry:

Media (1.00)
Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(7 more...)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback

The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization

Pilán, Ildikó, Lison, Pierre, Øvrelid, Lilja, Papadopoulou, Anthi, Sánchez, David, Batet, Montserrat

arXiv.org Artificial IntelligenceJan-25-2022

We present a novel benchmark and associated evaluation metrics for assessing the performance of text anonymization methods. Text anonymization, defined as the task of editing a text document to prevent the disclosure of personal information, currently suffers from a shortage of privacy-oriented annotated text resources, making it difficult to properly evaluate the level of privacy protection offered by various anonymization methods. This paper presents TAB (Text Anonymization Benchmark), a new, open-source annotated corpus developed to address this shortage. The corpus comprises 1,268 English-language court cases from the European Court of Human Rights (ECHR) enriched with comprehensive annotations about the personal information appearing in each document, including their semantic category, identifier type, confidential attributes, and co-reference relations. Compared to previous work, the TAB corpus is designed to go beyond traditional de-identification (which is limited to the detection of predefined semantic categories), and explicitly marks which text spans ought to be masked in order to conceal the identity of the person to be protected. Along with presenting the corpus and its annotation layers, we also propose a set of evaluation metrics that are specifically tailored towards measuring the performance of text anonymization, both in terms of privacy protection and utility preservation. We illustrate the use of the benchmark and the proposed metrics by assessing the empirical performance of several baseline text anonymization models. The full corpus along with its privacy-oriented annotation guidelines, evaluation scripts and baseline models are available on: https://github.com/NorskRegnesentral/text-anonymisation-benchmark

civil rights & constitutional law, information, machine learning, (25 more...)

arXiv.org Artificial Intelligence

2202.00443

Country:

Europe (1.00)
Asia > Middle East > Republic of Türkiye (0.28)
North America > United States > California (0.28)

Genre: Overview (0.92)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback