mia
Tracing the Roots: Leveraging Temporal Dynamics in Diffusion Trajectories for Origin Attribution
Diffusion models have transformed image synthesis through iterative denoising, by defining trajectories from noise to coherent data. While their capabilities are widely celebrated, a critical challenge remains unaddressed: ensuring responsible use by verifying whether an image originates from a model's training set, its novel generations or external sources. We introduce a framework that analyzes diffusion trajectories for this purpose. Specifically, we demonstrate that temporal dynamics across the entire trajectory allow for more robust classification and challenge the widely-adopted "Goldilocks zone" conjecture, which posits that membership inference is effective only within narrow denoising stages. More fundamentally, we expose critical flaws in current membership inference practices by showing that representative methods fail under distribution shifts or when model-generated data is present. For model attribution, we demonstrate a first white-box approach directly applicable to diffusion. Ultimately, we propose the unification of data provenance into a single, cohesive framework tailored to modern generative systems.
Generative Model Inversion Through the Lens of the Manifold Hypothesis
Model inversion attacks (MIAs) aim to reconstruct class-representative samples from trained models. Recent generative MIAs utilize generative adversarial networks to learn image priors that guide the inversion process, yielding reconstructions with high visual quality and strong fidelity to the private data. To explore the reason behind their effectiveness, we begin by examining the gradients of inversion loss w.r.t.
Practical Bayes-Optimal Membership Inference Attacks
We develop practical and theoretically grounded membership inference attacks (MIAs) against both independent and identically distributed (i.i.d.) data and graph-structured data. Building on the Bayesian decision-theoretic framework of Sabrayolles et al., we derive the Bayes-optimal membership inference rule for node-level MIAs against graph neural networks, addressing key open questions about optimal query strategies in the graph setting. We introduce BASE and G-BASE, tractable approximations of the Bayes-optimal membership inference. G-BASE achieves superior performance compared to previously proposed classifier-based node-level MIA attacks. BASE, which is also applicable to non-graph data, matches or exceeds the performance of prior state-of-the-art MIAs, such as LiRA and RMIA, at a significantly lower computational cost. Finally, we show that BASE and RMIA are equivalent under a specific hyperparameter setting, providing a principled, Bayes-optimal justification for the RMIA attack.
Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity
Wang, Weiqi, Tian, Zhiyi, Zhang, Chenhan, Chen, Luoyu, Yu, Shui
Machine unlearning is a fundamental mechanism that enforces the right to be forgotten. Existing unlearning studies that rely on label manipulation or task-gradient reversal often deliver limited unlearning effectiveness. Moreover, they can undermine the original learning objective and typically do not guarantee equivalence to standard unlearning by retraining. In this paper, we propose \textbf{ManiF-SMC} (\textbf{Mani}fold \textbf{F}orgetting with \textbf{S}elf \textbf{M}ode \textbf{C}onnectivity), motivated by the observation that a model retrained on the remaining data tends to classify erased samples by their semantic similarity to the retained data. We begin with systematically recasting the approximate unlearning as pushing each erased sample away from its original learned manifold representation centroid toward its nearest semantic neighbors in the retained data. This reformulation aligns unlearning with retraining behavior and operates purely in representation space, reducing reliance on labels and task-specific gradients. To tackle the manifold representation-based unlearning problem, ManiF-SMC encapsulates the unlearning and representation preservation goals in a margin-based triplet loss. Because finding a suitable margin for unlearning is challenging, we propose a self-mode-connectivity module that rapidly reconstructs the local manifold to guide the adaptive margins generation for each unlearning case. Extensive experiments on four representative datasets show that ManiF-SMC achieves unlearning effectiveness comparable to state-of-the-art approximate methods while operating solely within the model's representation space.
Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration
Membership Inference Attacks (MIA) aim to infer whether a target data record has been utilized for model training or not. Existing MIAs designed for large language models (LLMs) can be bifurcated into two types: reference-free and reference-based attacks. Although reference-based attacks appear promising performance by calibrating the probability measured on the target model with reference models, this illusion of privacy risk heavily depends on a reference dataset that closely resembles the training set. Both two types of attacks are predicated on the hypothesis that training records consistently maintain a higher probability of being sampled. However, this hypothesis heavily relies on the overfitting of target models, which will be mitigated by multiple regularization methods and the generalization of LLMs. Thus, these reasons lead to high false-positive rates of MIAs in practical scenarios.We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA). Specifically, we introduce a self-prompt approach, which constructs the dataset to fine-tune the reference model by prompting the target LLM itself. In this manner, the adversary can collect a dataset with a similar distribution from public APIs.Furthermore, we introduce probabilistic variation, a more reliable membership signal based on LLM memorization rather than overfitting, from which we rediscover the neighbour attack with theoretical grounding. Comprehensive evaluation conducted on three datasets and four exemplary LLMs shows that SPV-MIA raises the AUC of MIAs from 0.7 to a significantly high level of 0.9.
LLM Dataset Inference: Did you train on my dataset?
Recent works have presented methods to identify if individual text sequences were members of the model's training data, known as membership inference attacks (MIAs). We demonstrate that the apparent success of these MIAs is confounded by selecting non-members (text sequences not used for training) belonging to a different distribution from the members (e.g., temporally shifted recent Wikipedia articles compared with ones used to train the model). This distribution shift makes membership inference appear successful. However, most MIA methods perform no better than random guessing when discriminating between members and non-members from the same distribution (e.g., in this case, the same period of time).Even when MIAs work, we find that different MIAs succeed at inferring membership of samples from different distributions.Instead, we propose a new dataset inference method to accurately identify the datasets used to train large language models.