Goto

Collaborating Authors

 data reconstruction attack


GAN You See Me? Enhanced Data Reconstruction Attacks against Split Inference Ziang Li1, Mengda Y ang

Neural Information Processing Systems

To overcome these challenges, we propose a G AN-based LA tent S pace S earch attack ( GLASS) that harnesses abundant prior knowledge from public data using advanced StyleGAN technologies. Additionally, we introduce GLASS++ to enhance reconstruction stability.


Private Frequency Estimation Via Residue Number Systems

Arcolezi, Héber H.

arXiv.org Artificial Intelligence

We present \textsf{ModularSubsetSelection} (MSS), a new algorithm for locally differentially private (LDP) frequency estimation. Given a universe of size $k$ and $n$ users, our $\varepsilon$-LDP mechanism encodes each input via a Residue Number System (RNS) over $\ell$ pairwise-coprime moduli $m_0, \ldots, m_{\ell-1}$, and reports a randomly chosen index $j \in [\ell]$ along with the perturbed residue using the statistically optimal \textsf{SubsetSelection} (SS) (Wang et al. 2016). This design reduces the user communication cost from $Θ\bigl(ω\log_2(k/ω)\bigr)$ bits required by standard SS (with $ω\approx k/(e^\varepsilon+1)$) down to $\lceil \log_2 \ell \rceil + \lceil \log_2 m_j \rceil$ bits, where $m_j < k$. Server-side decoding runs in $Θ(n + r k \ell)$ time, where $r$ is the number of LSMR (Fong and Saunders 2011) iterations. In practice, with well-conditioned moduli (\textit{i.e.}, constant $r$ and $\ell = Θ(\log k)$), this becomes $Θ(n + k \log k)$. We prove that MSS achieves worst-case MSE within a constant factor of state-of-the-art protocols such as SS and \textsf{ProjectiveGeometryResponse} (PGR) (Feldman et al. 2022) while avoiding the algebraic prerequisites and dynamic-programming decoder required by PGR. Empirically, MSS matches the estimation accuracy of SS, PGR, and \textsf{RAPPOR} (Erlingsson, Pihur, and Korolova 2014) across realistic $(k, \varepsilon)$ settings, while offering faster decoding than PGR and shorter user messages than SS. Lastly, by sampling from multiple moduli and reporting only a single perturbed residue, MSS achieves the lowest reconstruction-attack success rate among all evaluated LDP protocols.


GAN You See Me? Enhanced Data Reconstruction Attacks against Split Inference Ziang Li1, Mengda Y ang

Neural Information Processing Systems

To overcome these challenges, we propose a G AN-based LA tent S pace S earch attack ( GLASS) that harnesses abundant prior knowledge from public data using advanced StyleGAN technologies. Additionally, we introduce GLASS++ to enhance reconstruction stability.


DRAG: Data Reconstruction Attack using Guided Diffusion

Lei, Wa-Kin, Chen, Jun-Cheng, Chen, Shang-Tse

arXiv.org Artificial Intelligence

With the rise of large foundation models, split inference (SI) has emerged as a popular computational paradigm for deploying models across lightweight edge devices and cloud servers, addressing data privacy and computational cost concerns. However, most existing data reconstruction attacks have focused on smaller CNN classification models, leaving the privacy risks of foundation models in SI settings largely unexplored. To address this gap, we propose a novel data reconstruction attack based on guided diffusion, which leverages the rich prior knowledge embedded in a latent diffusion model (LDM) pre-trained on a large-scale dataset. Our method performs iterative reconstruction on the LDM's learned image prior, effectively generating high-fidelity images resembling the original data from their intermediate representations (IR). Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods, both qualitatively and quantitatively, in reconstructing data from deep-layer IRs of the vision foundation model. The results highlight the urgent need for more robust privacy protection mechanisms for large models in SI scenarios. Code is available at: https://github.com/ntuaislab/DRAG.


On Reconstructing Training Data From Bayesian Posteriors and Trained Models

Wynne, George

arXiv.org Machine Learning

Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine learning methods. This paper makes three primary contributions: establishing a mathematical framework to express the problem, characterising the features of the training data that are vulnerable via a maximum mean discrepancy equivalance and outlining a score matching framework for reconstructing data in both Bayesian and non-Bayesian models, the former is a first in the literature.


DRAGD: A Federated Unlearning Data Reconstruction Attack Based on Gradient Differences

Ju, Bocheng, Fan, Junchao, Liu, Jiaqi, Chang, Xiaolin

arXiv.org Artificial Intelligence

--Federated learning enables collaborative machine learning while preserving data privacy. However, the rise of federated unlearning, designed to allow clients to erase their data from the gl obal model, introduces new privacy concerns. Specifically, the gradient exchanges during the unlearning process can leak sensitive information about deleted data. In this paper, we introduce DRAGD, a novel attack that exploits gradient discrepancies before and after unlearning to reconstruct forgotten data. We also present DRAGDP, an enhanced version of DRAGD that leverages publicly available prior data to improve reconstruction accuracy, particularly for complex datasets like facial images. Extensive experiments across multiple datasets demonst rate that DRAGD and DRAGDP significantly outperform existing methods in data reconstruction .Our work highlights a critical privacy vulnerability in federa ted unlearning and offers a practical solution, advancing the security of federated unlearning systems in real -world applications. VER recent years, federated learning (FL) [1]-[6] has emerged as a transformative paradigm for collaborative machine learning, enabling multiple clients to jointly train a global model without sharing their raw data. This decentralized approach is increasingly favored for its ability to enhance data privacy and security in sensitive domains such as healthcare, finance, and mobile applications.


Byzantine Outside, Curious Inside: Reconstructing Data Through Malicious Updates

Yue, Kai, Jin, Richeng, Wong, Chau-Wai, Dai, Huaiyu

arXiv.org Artificial Intelligence

Federated learning (FL) enables decentralized machine learning without sharing raw data, allowing multiple clients to collaboratively learn a global model. However, studies reveal that privacy leakage is possible under commonly adopted FL protocols. In particular, a server with access to client gradients can synthesize data resembling the clients' training data. In this paper, we introduce a novel threat model in FL, named the maliciously curious client, where a client manipulates its own gradients with the goal of inferring private data from peers. This attacker uniquely exploits the strength of a Byzantine adversary, traditionally aimed at undermining model robustness, and repurposes it to facilitate data reconstruction attack. We begin by formally defining this novel client-side threat model and providing a theoretical analysis that demonstrates its ability to achieve significant reconstruction success during FL training. To demonstrate its practical impact, we further develop a reconstruction algorithm that combines gradient inversion with malicious update strategies. Our analysis and experimental results reveal a critical blind spot in FL defenses: both server-side robust aggregation and client-side privacy mechanisms may fail against our proposed attack. Surprisingly, standard server- and client-side defenses designed to enhance robustness or privacy may unintentionally amplify data leakage. Compared to the baseline approach, a mistakenly used defense may instead improve the reconstructed image quality by 10-15%.


SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark

Wen, Rui, Liu, Yiyong, Backes, Michael, Zhang, Yang

arXiv.org Artificial Intelligence

Data reconstruction attacks, which aim to recover the training dataset of a target model with limited access, have gained increasing attention in recent years. However, there is currently no consensus on a formal definition of data reconstruction attacks or appropriate evaluation metrics for measuring their quality. This lack of rigorous definitions and universal metrics has hindered further advancement in this field. In this paper, we address this issue in the vision domain by proposing a unified attack taxonomy and formal definitions of data reconstruction attacks. We first propose a set of quantitative evaluation metrics that consider important criteria such as quantifiability, consistency, precision, and diversity. Additionally, we leverage large language models (LLMs) as a substitute for human judgment, enabling visual evaluation with an emphasis on high-quality reconstructions. Using our proposed taxonomy and metrics, we present a unified framework for systematically evaluating the strengths and limitations of existing attacks and establishing a benchmark for future research. Empirical results, primarily from a memorization perspective, not only validate the effectiveness of our metrics but also offer valuable insights for designing new attacks.


GAN You See Me? Enhanced Data Reconstruction Attacks against Split Inference

Neural Information Processing Systems

Split Inference (SI) is an emerging deep learning paradigm that addresses computational constraints on edge devices and preserves data privacy through collaborative edge-cloud approaches. However, SI is vulnerable to Data Reconstruction Attacks (DRA), which aim to reconstruct users' private prediction instances. Existing attack methods suffer from various limitations. Optimization-based DRAs do not leverage public data effectively, while Learning-based DRAs depend heavily on auxiliary data quantity and distribution similarity. Consequently, these approaches yield unsatisfactory attack results and are sensitive to defense mechanisms. To overcome these challenges, we propose a GAN-based LAtent Space Search attack (GLASS) that harnesses abundant prior knowledge from public data using advanced StyleGAN technologies.


Theoretical Analysis of Privacy Leakage in Trustworthy Federated Learning: A Perspective from Linear Algebra and Optimization Theory

Zhang, Xiaojin, Chen, Wei

arXiv.org Machine Learning

Federated learning has emerged as a promising paradigm for collaborative model training while preserving data privacy. However, recent studies have shown that it is vulnerable to various privacy attacks, such as data reconstruction attacks. In this paper, we provide a theoretical analysis of privacy leakage in federated learning from two perspectives: linear algebra and optimization theory. From the linear algebra perspective, we prove that when the Jacobian matrix of the batch data is not full rank, there exist different batches of data that produce the same model update, thereby ensuring a level of privacy. We derive a sufficient condition on the batch size to prevent data reconstruction attacks. From the optimization theory perspective, we establish an upper bound on the privacy leakage in terms of the batch size, the distortion extent, and several other factors. Our analysis provides insights into the relationship between privacy leakage and various aspects of federated learning, offering a theoretical foundation for designing privacy-preserving federated learning algorithms.