AITopics | gdpr

Collaborating Authors

gdpr

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DataSIR: ABenchmark Dataset for Sensitive Information Recognition

Neural Information Processing SystemsJun-16-2026, 09:13:35 GMT

With the rapid development of artificial intelligence technologies, the demand for training data has surged, exacerbating risks of data leakage. Despite increasing incidents and costs associated with such leaks, data leakage prevention (DLP) technologies lag behind evolving evasion techniques that bypass existing sensitive information recognition (SIR) models. Current datasets lack comprehensive coverage of these adversarial transformations, limiting the evaluation of robust SIR systems. To address this gap, we introduce DataSIR, a benchmark dataset specifically designed to evaluate SIR models on sensitive data subjected to diverse format transformations. We curate 26 sensitive data categories based on multiple international regulations, and collect 131,890 original samples correspondingly.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
(2 more...)

Add feedback

Parajudica: An RDF-Based Reasoner and Metamodel for Multi-Framework Context-Dependent Data Compliance Assessments

Moreau, Luc, Rossi, Alfred, Stalla-Bourdillon, Sophie

arXiv.org Artificial IntelligenceDec-8-2025

We demonstrate the utility of this resource and accompanying metamodel through application to existing legal frameworks and industry standards, offering insights for comparative framework analysis. Applications include compliance policy enforcement, compliance monitoring, data discovery, and risk assessment.

artificial intelligence, classification, stalla-bourdillon parajudica, (17 more...)

arXiv.org Artificial Intelligence

2512.05453

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law > Statutes (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.66)

Add feedback

Beyond Accuracy: Rethinking Hallucination and Regulatory Response in Generative AI

Li, Zihao, Yi, Weiwei, Chen, Jiahong

arXiv.org Artificial IntelligenceOct-27-2025

Hallucination in generative AI is often treated as a technical failure to produce factually correct output. Yet this framing underrepresents the broader significance of hallucinated content in language models, which may appear fluent, persuasive, and contextually appropriate while conveying distortions that escape conventional accuracy checks. This paper critically examines how regulatory and evaluation frameworks have inherited a narrow view of hallucination, one that prioritises surface verifiability over deeper questions of meaning, influence, and impact. We propose a layered approach to understanding hallucination risks, encompassing epistemic instability, user misdirection, and social-scale effects. Drawing on interdisciplinary sources and examining instruments such as the EU AI Act and the GDPR, we show that current governance models struggle to address hallucination when it manifests as ambiguity, bias reinforcement, or normative convergence. Rather than improving factual precision alone, we argue for regulatory responses that account for languages generative nature, the asymmetries between system and user, and the shifting boundaries between information, persuasion, and harm.

accuracy, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.13345

Country:

North America > Canada (0.28)
North America > Mexico (0.28)
North America > United States (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education (0.93)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Automated Boilerplate: Prevalence and Quality of Contract Generators in the Context of Swiss Privacy Policies

Nenadic, Luka, Rodriguez, David

arXiv.org Artificial IntelligenceOct-8-2025

It has become increasingly challenging for firms to comply with a plethora of novel digital regulations. This is especially true for smaller businesses that often lack both the resources and know-how to draft complex legal documents. Instead of seeking costly legal advice from attorneys, firms may turn to cheaper alternative legal service providers such as automated contract generators. While these services have a long-standing presence, there is little empirical evidence on their prevalence and output quality. We address this gap in the context of a 2023 Swiss privacy law revision. To enable a systematic evaluation, we create and annotate a multilingual benchmark dataset that captures key compliance obligations under Swiss and EU privacy law. Using this dataset, we validate a novel GPT-5-based method for large-scale compliance assessment of privacy policies, allowing us to measure the impact of the revision. We observe compliance increases indicating an effect of the revision. Generators, explicitly referenced by 18% of local websites, are associated with substantially higher levels of compliance, with increases of up to 15 percentage points compared to privacy policies without generator use. These findings contribute to three debates: the potential of LLMs for cross-lingual legal analysis, the Brussels Effect of EU regulations, and, crucially, the role of automated tools in improving compliance and contractual quality.

generator, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2510.0586

Country:

North America > United States (1.00)
Europe > Switzerland (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law > Statutes (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > Europe Government (0.93)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces

Imperial, Joseph Marvin, Madabushi, Harish Tayyar

arXiv.org Artificial IntelligenceSep-30-2025

Policy compliance assessment is a fundamental task of evaluating whether an input case strictly complies with a set of human-defined rules, more generally known as policies. In practice, human experts follow a systematic, step-by-step process to identify violations with respect to specific stipulations outlined in the policy. However, such documentation of gold-standard, expert-level reasoning processes is costly to acquire. In this paper, we introduce Policy Reasoning Traces (PRT), a form of specialized generated reasoning chains that serve as a reasoning bridge to improve an LLM's policy compliance assessment capabilities. Our empirical evaluations demonstrate that the use of PRTs for both inference-time and training-time scenarios significantly enhances the performance of open-weight and commercial models, setting a new state-of-the-art for HIPAA and GDPR policies. Beyond accuracy gains, we also highlight how PRTs can improve an LLM's ability to accurately cite policy clauses, as well as influence compliance decisions through their high utilization from the raw chains of thought.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.23291

Country:

North America > United States (0.46)
North America > Mexico (0.28)
Europe > Austria (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)

Add feedback

Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance

Hu, Wenbin, Jing, Huihao, Shi, Haochen, Li, Haoran, Song, Yangqiu

arXiv.org Artificial IntelligenceSep-29-2025

The proliferation of Large Language Models (LLMs) has demonstrated remarkable capabilities, elevating the critical importance of LLM safety. However, existing safety methods rely on ad-hoc taxonomy and lack a rigorous, systematic protection, failing to ensure safety for the nuanced and complex behaviors of modern LLM systems. To address this problem, we solve LLM safety from legal compliance perspectives, named safety compliance. In this work, we posit relevant established legal frameworks as safety standards for defining and measuring safety compliance, including the EU AI Act and GDPR, which serve as core legal frameworks for AI safety and data security in Europe. To bridge the gap between LLM safety and legal compliance, we first develop a new benchmark for safety compliance by generating realistic LLM safety scenarios seeded with legal statutes. Subsequently, we align Qwen3-8B using Group Policy Optimization (GRPO) to construct a safety reasoner, Compliance Reasoner, which effectively aligns LLMs with legal standards to mitigate safety risks. Our comprehensive experiments demonstrate that the Compliance Reasoner achieves superior performance on the new benchmark, with average improvements of +10.45% for the EU AI Act and +11.85% for GDPR.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.2225

Country: Europe (0.66)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can AI be Consentful?

Pistilli, Giada, Trevelin, Bruna

arXiv.org Artificial IntelligenceJul-3-2025

The evolution of generative AI systems exposes the challenges of traditional legal and ethical frameworks built around consent. This chapter examines how the conventional notion of consent, while fundamental to data protection and privacy rights, proves insufficient in addressing the implications of AI-generated content derived from personal data. Through legal and ethical analysis, we show that while individuals can consent to the initial use of their data for AI training, they cannot meaningfully consent to the numerous potential outputs their data might enable or the extent to which the output is used or distributed. We identify three fundamental challenges: the scope problem, the temporality problem, and the autonomy trap, which collectively create what we term a "consent gap" in AI systems and their surrounding ecosystem. We argue that current legal frameworks inadequately address these emerging challenges, particularly regarding individual autonomy, identity rights, and social responsibility, especially in cases where AI-generated content creates new forms of personal representation beyond the scope of the original consent. By examining how these consent limitations intersect with broader principles of responsible AI - including fairness, transparency, accountability, and autonomy - we demonstrate the need to evolve ethical and legal approaches to consent.

artificial intelligence, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2507.01051

Country:

Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
North America > United States > California (0.04)
North America > Canada (0.04)
(8 more...)

Genre: Research Report (0.50)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)

Add feedback

Concerns raised over AI trained on 57 million NHS medical records

New ScientistMay-7-2025, 14:28:38 GMT

An artificial intelligence model trained on the medical data of 57 million people who have used the National Health Service in England could one day assist doctors in predicting disease or forecast hospitalisation rates, its creators have claimed. However, other researchers say there are still significant privacy and data protection concerns around such large-scale use of health data, while even the AI's architects say they can't guarantee that it won't inadvertently reveal sensitive patient data. The model, called Foresight, was first developed in 2023. That initial version used OpenAI's GPT-3, the large language model (LLM) behind the first version of ChatGPT, and trained on 1.5 million real patient records from two London hospitals. Now, Chris Tomlinson at University College London and his colleagues have scaled up Foresight to create what they say is the world's first "national-scale generative AI model of health data" and the largest of its kind.

large language model, machine learning, natural language, (18 more...)

New Scientist

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.06)

Genre: Research Report (0.38)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.70)
Government > Regional Government > Europe Government > United Kingdom Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.56)

Add feedback

ChatGPT reportedly accused innocent man of murdering his children

EngadgetMar-20-2025, 12:00:57 GMT

It has been over two years since ChatGPT exploded onto the world stage and, while OpenAI has advanced it in many ways, there's still quite a few hurdles. Now, Austrian advocacy group Noyb has filed its second complaint against OpenAI for such hallucinations, naming a specific instance in which ChatGPT reportedly -- and wrongly -- stated that a Norwegian man was a murderer. To make matters, somehow, even worse, when this man asked ChatGPT what it knew about him, it reportedly stated that he was sentenced to 21 years in prison for killing two of his children and attempting to murder his third. The hallucination was also sprinkled with real information, including the number of children he had, their genders and the name of his home town. Noyb claims that this response put OpenAI in violation of GDPR.

large language model, machine learning, natural language, (14 more...)

Engadget

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.80)

Add feedback

Machine Learners Should Acknowledge the Legal Implications of Large Language Models as Personal Data

Nolte, Henrik, Finck, Michèle, Meding, Kristof

arXiv.org Artificial IntelligenceMar-3-2025

Does GPT know you? The answer depends on your level of public recognition; however, if your information was available on a website, the answer is probably yes. All Large Language Models (LLMs) memorize training data to some extent. If an LLM training corpus includes personal data, it also memorizes personal data. Developing an LLM typically involves processing personal data, which falls directly within the scope of data protection laws. If a person is identified or identifiable, the implications are far-reaching: the AI system is subject to EU General Data Protection Regulation requirements even after the training phase is concluded. To back our arguments: (1.) We reiterate that LLMs output training data at inference time, be it verbatim or in generalized form. (2.) We show that some LLMs can thus be considered personal data on their own. This triggers a cascade of data protection implications such as data subject rights, including rights to access, rectification, or erasure. These rights extend to the information embedded with-in the AI model. (3.) This paper argues that machine learning researchers must acknowledge the legal implications of LLMs as personal data throughout the full ML development lifecycle, from data collection and curation to model provision on, e.g., GitHub or Hugging Face. (4.) We propose different ways for the ML research community to deal with these legal implications. Our paper serves as a starting point for improving the alignment between data protection law and the technical capabilities of LLMs. Our findings underscore the need for more interaction between the legal domain and the ML community.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2503.0163

Country:

North America > United States (0.47)
Europe > France (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
(4 more...)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback