AITopics | wikipedia biography

Collaborating Authors

wikipedia biography

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Re-identification of De-identified Documents with Autoregressive Infilling

Charpentier, Lucas Georges Gabriel, Lison, Pierre

arXiv.org Artificial IntelligenceMay-20-2025

Documents revealing sensitive information about individuals must typically be de-identified. This de-identification is often done by masking all mentions of personally identifiable information (PII), thereby making it more difficult to uncover the identity of the person(s) in question. To investigate the robustness of de-identification methods, we present a novel, RAG-inspired approach that attempts the reverse process of re-identification based on a database of documents representing background knowledge. Given a text in which personal identifiers have been masked, the re-identification proceeds in two steps. A retriever first selects from the background knowledge passages deemed relevant for the re-identification. Those passages are then provided to an infilling model which seeks to infer the original content of each text span. This process is repeated until all masked spans are replaced. We evaluate the re-identification on three datasets (Wikipedia biographies, court rulings and clinical notes). Results show that (1) as many as 80% of de-identified text spans can be successfully recovered and (2) the re-identification accuracy increases along with the level of background knowledge.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.12859

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)
Asia > Middle East (0.28)
Asia > Japan (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.50)
Government > Regional Government > Europe Government (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Neural Text Sanitization with Privacy Risk Indicators: An Empirical Analysis

Papadopoulou, Anthi, Lison, Pierre, Anderson, Mark, Øvrelid, Lilja, Pilán, Ildikó

arXiv.org Artificial IntelligenceOct-22-2023

Text sanitization is the task of redacting a document to mask all occurrences of (direct or indirect) personal identifiers, with the goal of concealing the identity of the individual(s) referred in it. In this paper, we consider a two-step approach to text sanitization and provide a detailed analysis of its empirical performance on two recently published datasets: the Text Anonymization Benchmark (Pil\'an et al., 2022) and a collection of Wikipedia biographies (Papadopoulou et al., 2022). The text sanitization process starts with a privacy-oriented entity recognizer that seeks to determine the text spans expressing identifiable personal information. This privacy-oriented entity recognizer is trained by combining a standard named entity recognition model with a gazetteer populated by person-related terms extracted from Wikidata. The second step of the text sanitization process consists in assessing the privacy risk associated with each detected text span, either isolated or in combination with other text spans. We present five distinct indicators of the re-identification risk, respectively based on language model probabilities, text span classification, sequence labelling, perturbations, and web search. We provide a contrastive analysis of each privacy indicator and highlight their benefits and limitations, notably in relation to the available labeled data.

probability, span, text span, (12 more...)

arXiv.org Artificial Intelligence

2310.14312

Country:

Europe > Russia (0.28)
Asia > Russia (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(28 more...)

Genre:

Research Report (1.00)
Personal > Honors (0.46)

Industry:

Media (1.00)
Law > Statutes (1.00)
Law > Criminal Law (1.00)
(9 more...)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback

Wikibio: a Semantic Resource for the Intersectional Analysis of Biographical Events

Stranisci, Marco Antonio, Damiano, Rossana, Mensa, Enrico, Patti, Viviana, Radicioni, Daniele, Caselli, Tommaso

arXiv.org Artificial IntelligenceJun-15-2023

Biographical event detection is a relevant task for the exploration and comparison of the ways in which people's lives are told and represented. In this sense, it may support several applications in digital humanities and in works aimed at exploring bias about minoritized groups. Despite that, there are no corpora and models specifically designed for this task. In this paper we fill this gap by presenting a new corpus annotated for biographical event detection. The corpus, which includes 20 Wikipedia biographies, was compared with five existing corpora to train a model for the biographical event detection task. The model was able to detect all mentions of the target-entity in a biography with an F-score of 0.808 and the entity-related events with an F-score of 0.859. Finally, the model was used for performing an analysis of biases about women and non-Western people in Wikipedia biographies.

artificial intelligence, natural language, text processing, (17 more...)

arXiv.org Artificial Intelligence

2306.09505

Country:

Europe > Italy > Piedmont > Turin Province > Turin (0.14)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry: Media (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.82)

Add feedback

Meta AI's open-source system attempts to right gender bias in Wikipedia biographies

#artificialintelligenceApr-1-2022, 15:07:11 GMT

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - August 3. Join AI and data leaders for insightful talks and exciting networking opportunities. By this point, it's become reflexive: When searching for something on Google, Wikipedia is the de facto go-to first page. The website is consistently among the top 10 most-visited websites in the world. Yet, not all changemakers and historical figures are equally represented on the dominant web encyclopedia. Just 20% of Wikipedia biographies are about women.

biography, open-source system attempt, wikipedia biography, (12 more...)

#artificialintelligence

Country:

Africa (0.06)
Europe > France (0.05)
Asia (0.05)

Technology:

Information Technology > Communications > Social Media (0.98)
Information Technology > Artificial Intelligence > Natural Language (0.73)

Add feedback