AITopics | Jagielski, Matthew

Collaborating Authors

Jagielski, Matthew

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Language Models May Verbatim Complete TextThey Were Not Explicitly Trained On

Liu, Ken Ziyu, Choquette-Choo, Christopher A., Jagielski, Matthew, Kairouz, Peter, Koyejo, Sanmi, Liang, Percy, Papernot, Nicolas

arXiv.org Artificial IntelligenceMar-21-2025

An important question today is whether a given text was used to train a large language model (LLM). A \emph{completion} test is often employed: check if the LLM completes a sufficiently complex text. This, however, requires a ground-truth definition of membership; most commonly, it is defined as a member based on the $n$-gram overlap between the target text and any text in the dataset. In this work, we demonstrate that this $n$-gram based membership definition can be effectively gamed. We study scenarios where sequences are \emph{non-members} for a given $n$ and we find that completion tests still succeed. We find many natural cases of this phenomenon by retraining LLMs from scratch after removing all training samples that were completed; these cases include exact duplicates, near-duplicates, and even short overlaps. They showcase that it is difficult to find a single viable choice of $n$ for membership definitions. Using these insights, we design adversarial datasets that can cause a given target sequence to be completed without containing it, for any reasonable choice of $n$. Our findings highlight the inadequacy of $n$-gram membership, suggesting membership definitions fail to account for auxiliary information available to the training algorithm.

artificial intelligence, large language model, natural language, (8 more...)

arXiv.org Artificial Intelligence

2503.17514

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training

Borkar, Jaydeep, Jagielski, Matthew, Lee, Katherine, Mireshghallah, Niloofar, Smith, David A., Choquette-Choo, Christopher A.

arXiv.org Artificial IntelligenceFeb-21-2025

Due to the sensitive nature of personally identifiable information (PII), its owners may have the authority to control its inclusion or request its removal from large-language model (LLM) training. Beyond this, PII may be added or removed from training datasets due to evolving dataset curation techniques, because they were newly scraped for retraining, or because they were included in a new downstream fine-tuning stage. We find that the amount and ease of PII memorization is a dynamic property of a model that evolves throughout training pipelines and depends on commonly altered design choices. We characterize three such novel phenomena: (1) similar-appearing PII seen later in training can elicit memorization of earlier-seen sequences in what we call assisted memorization, and this is a significant factor (in our settings, up to 1/3); (2) adding PII can increase memorization of other PII significantly (in our settings, as much as $\approx\!7.5\times$); and (3) removing PII can lead to other PII being memorized. Model creators should consider these first- and second-order privacy risks when training models to avoid the risk of new PII regurgitation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.1568

Country:

North America > United States (1.00)
Europe (0.93)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards

Huang, Yangsibo, Nasr, Milad, Angelopoulos, Anastasios, Carlini, Nicholas, Chiang, Wei-Lin, Choquette-Choo, Christopher A., Ippolito, Daphne, Jagielski, Matthew, Lee, Katherine, Liu, Ken Ziyu, Stoica, Ion, Tramer, Florian, Zhang, Chiyuan

arXiv.org Artificial IntelligenceJan-13-2025

It is now common to evaluate Large Language Models (LLMs) by having humans manually vote to evaluate model outputs, in contrast to typical benchmarks that evaluate knowledge or skill at some particular task. Chatbot Arena, the most popular benchmark of this type, ranks models by asking users to select the better response between two randomly selected models (without revealing which model was responsible for the generations). These platforms are widely trusted as a fair and accurate measure of LLM capabilities. In this paper, we show that if bot protection and other defenses are not implemented, these voting-based benchmarks are potentially vulnerable to adversarial manipulation. Specifically, we show that an attacker can alter the leaderboard (to promote their favorite model or demote competitors) at the cost of roughly a thousand votes (verified in a simulated, offline version of Chatbot Arena). Our attack consists of two steps: first, we show how an attacker can determine which model was used to generate a given reply with more than $95\%$ accuracy; and then, the attacker can use this information to consistently vote for (or against) a target model. Working with the Chatbot Arena developers, we identify, propose, and implement mitigations to improve the robustness of Chatbot Arena against adversarial manipulation, which, based on our analysis, substantially increases the cost of such attacks. Some of these defenses were present before our collaboration, such as bot protection with Cloudflare, malicious user detection, and rate limiting. Others, including reCAPTCHA and login are being integrated to strengthen the security in Chatbot Arena.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.07493

Country:

North America > United States (0.14)
Europe > Middle East > Malta (0.14)
Europe > Italy (0.14)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice

Cooper, A. Feder, Choquette-Choo, Christopher A., Bogen, Miranda, Jagielski, Matthew, Filippova, Katja, Liu, Ken Ziyu, Chouldechova, Alexandra, Hayes, Jamie, Huang, Yangsibo, Mireshghallah, Niloofar, Shumailov, Ilia, Triantafillou, Eleni, Kairouz, Peter, Mitchell, Nicole, Liang, Percy, Ho, Daniel E., Choi, Yejin, Koyejo, Sanmi, Delgado, Fernando, Grimmelmann, James, Shmatikov, Vitaly, De Sa, Christopher, Barocas, Solon, Cyphert, Amy, Lemley, Mark, boyd, danah, Vaughan, Jennifer Wortman, Brundage, Miles, Bau, David, Neel, Seth, Jacobs, Abigail Z., Terzis, Andreas, Wallach, Hanna, Papernot, Nicolas, Lee, Katherine

arXiv.org Artificial IntelligenceDec-9-2024

We articulate fundamental mismatches between technical methods for machine unlearning in Generative AI, and documented aspirations for broader impact that these methods could have for law and policy. These aspirations are both numerous and varied, motivated by issues that pertain to privacy, copyright, safety, and more. For example, unlearning is often invoked as a solution for removing the effects of targeted information from a generative-AI model's parameters, e.g., a particular individual's personal data or in-copyright expression of Spiderman that was included in the model's training data. Unlearning is also proposed as a way to prevent a model from generating targeted types of information in its outputs, e.g., generations that closely resemble a particular individual's data or reflect the concept of "Spiderman." Both of these goals--the targeted removal of information from a model and the targeted suppression of information from a model's outputs--present various technical and substantive challenges. We provide a framework for thinking rigorously about these challenges, which enables us to be clear about why unlearning is not a general-purpose solution for circumscribing generative-AI model behavior in service of broader positive impact. We aim for conceptual clarity and to encourage more thoughtful communication among machine learning (ML), law, and policy experts who seek to develop and apply technical methods for compliance with policy objectives.

information, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2412.06966

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry:

Law > Intellectual Property & Technology Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

On Evaluating the Durability of Safeguards for Open-Weight LLMs

Qi, Xiangyu, Wei, Boyi, Carlini, Nicholas, Huang, Yangsibo, Xie, Tinghao, He, Luxi, Jagielski, Matthew, Nasr, Milad, Mittal, Prateek, Henderson, Peter

arXiv.org Artificial IntelligenceDec-9-2024

Stakeholders -- from model developers to policymakers -- seek to minimize the dual-use risks of large language models (LLMs). An open challenge to this goal is whether technical safeguards can impede the misuse of LLMs, even when models are customizable via fine-tuning or when model weights are fully open. In response, several recent studies have proposed methods to produce durable LLM safeguards for open-weight LLMs that can withstand adversarial modifications of the model's weights via fine-tuning. This holds the promise of raising adversaries' costs even under strong threat models where adversaries can directly fine-tune model weights. However, in this paper, we urge for more careful characterization of the limits of these approaches. Through several case studies, we demonstrate that even evaluating these defenses is exceedingly difficult and can easily mislead audiences into thinking that safeguards are more durable than they really are. We draw lessons from the evaluation pitfalls that we identify and suggest future research carefully cabin claims to more constrained, well-defined, and rigorously examined threat models, which can provide more useful and candid assessments to stakeholders.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.07097

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Last Iterate Advantage: Empirical Auditing and Principled Heuristic Analysis of Differentially Private SGD

Steinke, Thomas, Nasr, Milad, Ganesh, Arun, Balle, Borja, Choquette-Choo, Christopher A., Jagielski, Matthew, Hayes, Jamie, Thakurta, Abhradeep Guha, Smith, Adam, Terzis, Andreas

arXiv.org Artificial IntelligenceOct-10-2024

We propose a simple heuristic privacy analysis of noisy clipped stochastic gradient descent (DP-SGD) in the setting where only the last iterate is released and the intermediate iterates remain hidden. Namely, our heuristic assumes a linear structure for the model. We show experimentally that our heuristic is predictive of the outcome of privacy auditing applied to various training procedures. Thus it can be used prior to training as a rough estimate of the final privacy leakage. We also probe the limitations of our heuristic by providing some artificial counterexamples where it underestimates the privacy leakage. The standard composition-based privacy analysis of DP-SGD effectively assumes that the adversary has access to all intermediate iterates, which is often unrealistic. However, this analysis remains the state of the art in practice. While our heuristic does not replace a rigorous privacy analysis, it illustrates the large gap between the best theoretical upper bounds and the privacy auditing lower bounds and sets a target for further work to improve the theoretical privacy analyses. We also empirically support our heuristic and show existing privacy auditing attacks are bounded by our heuristic analysis in both vision and language tasks.

artificial intelligence, iteration, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.06186

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI

Shumailov, Ilia, Hayes, Jamie, Triantafillou, Eleni, Ortiz-Jimenez, Guillermo, Papernot, Nicolas, Jagielski, Matthew, Yona, Itay, Howard, Heidi, Bagdasaryan, Eugene

arXiv.org Artificial IntelligenceJun-27-2024

Exact unlearning was first introduced as a privacy mechanism that allowed a user to retract their data from machine learning models on request. Shortly after, inexact schemes were proposed to mitigate the impractical costs associated with exact unlearning. More recently unlearning is often discussed as an approach for removal of impermissible knowledge i.e. knowledge that the model should not possess such as unlicensed copyrighted, inaccurate, or malicious information. The promise is that if the model does not have a certain malicious capability, then it cannot be used for the associated malicious purpose. In this paper we revisit the paradigm in which unlearning is used for in Large Language Models (LLMs) and highlight an underlying inconsistency arising from in-context learning. Unlearning can be an effective control mechanism for the training phase, yet it does not prevent the model from performing an impermissible act during inference. We introduce a concept of ununlearning, where unlearned knowledge gets reintroduced in-context, effectively rendering the model capable of behaving as if it knows the forgotten knowledge. As a result, we argue that content filtering for impermissible knowledge will be required and even exact unlearning schemes are not enough for effective content regulation. We discuss feasibility of ununlearning for modern LLMs and examine broader implications.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2407.00106

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (0.68)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.42)

Add feedback

Phantom: General Trigger Attacks on Retrieval Augmented Language Generation

Chaudhari, Harsh, Severi, Giorgio, Abascal, John, Jagielski, Matthew, Choquette-Choo, Christopher A., Nasr, Milad, Nita-Rotaru, Cristina, Oprea, Alina

arXiv.org Artificial IntelligenceMay-30-2024

Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs) in chatbot applications, enabling developers to adapt and personalize the LLM output without expensive training or fine-tuning. RAG systems use an external knowledge database to retrieve the most relevant documents for a given query, providing this context to the LLM generator. While RAG achieves impressive utility in many applications, its adoption to enable personalized generative models introduces new security risks. In this work, we propose new attack surfaces for an adversary to compromise a victim's RAG system, by injecting a single malicious document in its knowledge database. We design Phantom, general two-step attack framework against RAG augmented LLMs. The first step involves crafting a poisoned document designed to be retrieved by the RAG system within the top-k results only when an adversarial trigger, a specific sequence of words acting as backdoor, is present in the victim's queries. In the second step, a specially crafted adversarial string within the poisoned document triggers various adversarial attacks in the LLM generator, including denial of service, reputation damage, privacy violations, and harmful behaviors. We demonstrate our attacks on multiple LLM architectures, including Gemma, Vicuna, and Llama.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.20485

Country: Oceania > Australia (0.14)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Noise Masking Attacks and Defenses for Pretrained Speech Models

Jagielski, Matthew, Thakkar, Om, Wang, Lun

arXiv.org Artificial IntelligenceApr-2-2024

Speech models are often trained on sensitive data in order to improve model performance, leading to potential privacy leakage. Our work considers noise masking attacks, introduced by Amid et al. 2022, which attack automatic speech recognition (ASR) models by requesting a transcript of an utterance which is partially replaced with noise. They show that when a record has been seen at training time, the model will transcribe the noisy record with its memorized sensitive transcript. In our work, we extend these attacks beyond ASR models, to attack pretrained speech encoders. Our method fine-tunes the encoder to produce an ASR model, and then performs noise masking on this model, which we find recovers private information from the pretraining data, despite the model never having seen transcripts at pretraining time! We show how to improve the precision of these attacks and investigate a number of countermeasures to our attacks.

artificial intelligence, machine learning, noise, (18 more...)

arXiv.org Artificial Intelligence

2404.02052

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)

Add feedback

Synthetic Query Generation for Privacy-Preserving Deep Retrieval Systems using Differentially Private Language Models

Carranza, Aldo Gael, Farahani, Rezsa, Ponomareva, Natalia, Kurakin, Alex, Jagielski, Matthew, Nasr, Milad

arXiv.org Artificial IntelligenceJan-8-2024

We address the challenge of ensuring differential privacy (DP) guarantees in training deep retrieval systems. Training these systems often involves the use of contrastive-style losses, which are typically non-per-example decomposable, making them difficult to directly DP-train with since common techniques require per-example gradient. To address this issue, we propose an approach that prioritizes ensuring query privacy prior to training a deep retrieval system. Our method employs DP language models (LMs) to generate private synthetic queries representative of the original data. These synthetic queries can be used in downstream retrieval system training without compromising privacy. Our approach demonstrates a significant enhancement in retrieval quality compared to direct DP-training, all while maintaining query-level privacy guarantees. This work highlights the potential of harnessing LMs to overcome limitations in standard DP-training methods.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.05973

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)
Information Technology > Data Science > Data Mining > Big Data (0.40)

Add feedback