Goto

Collaborating Authors

 privacy risk






fc4ddc15f9f4b4b06ef7844d6bb53abf-AuthorFeedback.pdf

Neural Information Processing Systems

A: We omitted this accidentally, but will definitely reference this in our revised3 version. Carlini et al. demonstrate privacy risks on models trained with standard SGD. Their attacks do not hold4 even with very weak differential privacy guarantees. In fact, we also evaluate our attack using two-layer neural networks, and the performance is similar. See8 Figure2(d),(e),(f),andTables1and3.9 Q: What does it mean for yp to be the smallest probability class on xp? A: The class which the model predicts with10 the smallest probability.


A Method

Neural Information Processing Systems

As computing the inverse second-order derivatives is the most computation-intensive operation, we will focus on it. In Section 3.1, we use the trick of least square to compute the We can leverage the Neumann series to compute the matrix inverse. B.1 Proof of the Approximation by Implicit Gradients Here, we provide the proof for J. B.2 Proof of Theorem 3.1 Before we prove our main theorem, we prove several essential lemmas as below. Using Assumption 3.4 and 3.5 directly lead to r By Assumption 3.4, we have r By Lemma B.1 and Lemma B.2, we have r If Assumption 3.4 and 3.5 hold, then the The linear model we use is a matrix that maps the input data into a vector. LeNet model is a convolutional neural network with 4 convolutional layers and 1 fully connected layer.



Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable

Neural Information Processing Systems

Machine unlearning is motivated by principles of data autonomy. The premise is that a person can request to have their data's influence removed from deployed models, and those models should be updated as if they were retrained without the person's data. We show that these updates expose individuals to high-accuracy reconstruction attacks which allow the attacker to recover their data in its entirety, even when the original models are so simple that privacy risk might not otherwise have been a concern. We show how to mount a near-perfect attack on the deleted data point from linear regression models. We then generalize our attack to other loss functions and architectures, and empirically demonstrate the effectiveness of our attacks across a wide range of datasets (capturing both tabular and image data). Our work highlights that privacy risk is significant even for extremely simple model classes when individuals can request deletion of their data from the model.


Private Attribute Inference from Images with Vision-Language Models

Neural Information Processing Systems

As large language models (LLMs) become ubiquitous in our daily tasks and digital interactions, associated privacy risks are increasingly in focus. While LLM privacy research has primarily focused on the leakage of model training data, it has recently been shown that LLMs can make accurate privacy-infringing inferences from previously unseen texts. With the rise of vision-language models (VLMs), capable of understanding both images and text, a key question is whether this concern transfers to the previously unexplored domain of benign images posted online. To answer this question, we compile an image dataset with human-annotated labels of the image owner's personal attributes. In order to understand the privacy risks posed by VLMs beyond traditional human attribute recognition, our dataset consists of images where the inferable private attributes do not stem from direct depictions of humans. On this dataset, we evaluate 7 state-of-the-art VLMs, finding that they can infer various personal attributes at up to 77.6% accuracy. Concerningly, we observe that accuracy scales with the general capabilities of the models, implying that future models can be misused as stronger inferential adversaries, establishing an imperative for the development of adequate defenses.


When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation

Ward, Joshua, Gu, Bochao, Wang, Chi-Hua, Cheng, Guang

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have recently demonstrated remarkable performance in generating high-quality tabular synthetic data. In practice, two primary approaches have emerged for adapting LLMs to tabular data generation: (i) fine-tuning smaller models directly on tabular datasets, and (ii) prompting larger models with examples provided in context. In this work, we show that popular implementations from both regimes exhibit a tendency to compromise privacy by reproducing memorized patterns of numeric digits from their training data. To systematically analyze this risk, we introduce a simple No-box Membership Inference Attack (MIA) called LevAtt that assumes adversarial access to only the generated synthetic data and targets the string sequences of numeric digits in synthetic observations. Using this approach, our attack exposes substantial privacy leakage across a wide range of models and datasets, and in some cases, is even a perfect membership classifier on state-of-the-art models. Our findings highlight a unique privacy vulnerability of LLM-based synthetic data generation and the need for effective defenses. To this end, we propose two methods, including a novel sampling strategy that strategically perturbs digits during generation. Our evaluation demonstrates that this approach can defeat these attacks with minimal loss of fidelity and utility of the synthetic data.