Goto

Collaborating Authors

 attack accuracy




Model Inversion Attack Against Deep Hashing

Zhao, Dongdong, Xu, Qiben, Fang, Ranxin, Song, Baogang

arXiv.org Artificial Intelligence

Deep hashing improves retrieval efficiency through compact binary codes, yet it introduces severe and often overlooked privacy risks. The ability to reconstruct original training data from hash codes could lead to serious threats such as biometric forgery and privacy breaches. However, model inversion attacks specifically targeting deep hashing models remain unexplored, leaving their security implications unexamined. This research gap stems from the inaccessibility of genuine training hash codes and the highly discrete Hamming space, which prevents existing methods from adapting to deep hashing. To address these challenges, we propose DHMI, the first diffusion-based model inversion framework designed for deep hashing. DHMI first clusters an auxiliary dataset to derive semantic hash centers as surrogate anchors. It then introduces a surrogate-guided denoising optimization method that leverages a novel attack metric (fusing classification consistency and hash proximity) to dynamically select candidate samples. A cluster of surrogate models guides the refinement of these candidates, ensuring the generation of high-fidelity and semantically consistent images. Experiments on multiple datasets demonstrate that DHMI successfully reconstructs high-resolution, high-quality images even under the most challenging black-box setting, where no training hash codes are available. Our method outperforms the existing state-of-the-art model inversion attacks in black-box scenarios, confirming both its practical efficacy and the critical privacy risks inherent in deep hashing systems.


Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment

Ho, Sy-Tuyen, Hao, Koh Jun, Nguyen, Ngoc-Bao, Binder, Alexander, Cheung, Ngai-Man

arXiv.org Artificial Intelligence

Model Inversion (MI) attacks aim to reconstruct information from private training data by exploiting access to machine learning models T. To evaluate such attacks, the standard evaluation framework relies on an evaluation model E, trained under the same task design as T. This framework has become the de facto standard for assessing progress in MI research, used across nearly all recent MI studies without question. In this paper, we present the first in-depth study of this evaluation framework. In particular, we identify a critical issue of this standard framework: Type-I adversarial examples. These are reconstructions that do not capture the visual features of private training data, yet are still deemed successful by T and ultimately transferable to E. Such false positives undermine the reliability of the standard MI evaluation framework. To address this issue, we introduce a new MI evaluation framework that replaces the evaluation model E with advanced Multimodal Large Language Models (MLLMs). By leveraging their general-purpose visual understanding, our MLLM-based framework does not depend on training of shared task design as in T, thus reducing Type-I transferability and providing more faithful assessments of reconstruction success. Using our MLLM-based evaluation framework, we reevaluate 27 diverse MI attack setups and empirically reveal consistently high false positive rates under the standard evaluation framework. Importantly, we demonstrate that many state-of-the-art (SOTA) MI methods report inflated attack accuracy, indicating that actual privacy leakage is significantly lower than previously believed. By uncovering this critical issue and proposing a robust solution, our work enables a reassessment of progress in MI research and sets a new standard for reliable and robust evaluation. Code can be found in https://github.com/hosytuyen/MI-Eval-MLLM




Common Q1: Theoretical justification on why A WP works

Neural Information Processing Systems

Common Q1: Theoretical justification on why A WP works. Based on previous work on P AC-Bayes bound (Neyshabur et al., NeurIPS 2017), in adversarial training, let R#1 Q1: The weights are constantly perturbed in the worst case, the model may find it difficult to learn. R#1 Q2: How do the baseline methods that do implicit weight perturbations differ from A WP? We did not claim that "baseline methods do the implicit weight perturbations". R#1 Q3: What is the difference of weights learned by A T -A WP and vanilla A T? R#2 Q1: Only CIF AR-10 and single neural networks are tested. We have tested several network architectures and datasets in the main body and appendix, e.g., PreAct ResNet-18, R#2 Q2: In Figure 1, the α value in the loss landscape is embed into training or post-training?


Overcoming Black-box Attack Inefficiency with Hybrid and Dynamic Select Algorithms

Belde, Abhinay Shankar, Ramkumar, Rohit, Rusert, Jonathan

arXiv.org Artificial Intelligence

Adversarial text attack research plays a crucial role in evaluating the robustness of NLP models. However, the increasing complexity of transformer-based architectures has dramatically raised the computational cost of attack testing, especially for researchers with limited resources (e.g., GPUs). Existing popular black-box attack methods often require a large number of queries, which can make them inefficient and impractical for researchers. To address these challenges, we propose two new attack selection strategies called Hybrid and Dynamic Select, which better combine the strengths of previous selection algorithms. Hybrid Select merges generalized BinarySelect techniques with GreedySelect by introducing a size threshold to decide which selection algorithm to use. Dynamic Select provides an alternative approach of combining the generalized Binary and GreedySelect by learning which lengths of texts each selection method should be applied to. This greatly reduces the number of queries needed while maintaining attack effectiveness (a limitation of BinarySelect). Across 4 datasets and 6 target models, our best method(sentence-level Hybrid Select) is able to reduce the number of required queries per attack up 25.82\% on average against both encoder models and LLMs, without losing the effectiveness of the attack.


Reminiscence Attack on Residuals: Exploiting Approximate Machine Unlearning for Privacy

Xiao, Yaxin, Ye, Qingqing, Hu, Li, Zheng, Huadi, Hu, Haibo, Liang, Zi, Li, Haoyang, Jiao, Yijie

arXiv.org Artificial Intelligence

Machine unlearning enables the removal of specific data from ML models to uphold the right to be forgotten . While approximate unlearning algorithms offer efficient alternatives to full retraining, this work reveals that they fail to adequately protect the privacy of unlearned data. In particular, these algorithms introduce implicit residuals which facilitate privacy attacks targeting at unlearned data. W e observe that these residuals persist regardless of model architectures, parameters, and unlearning algorithms, exposing a new attack surface beyond conventional output-based leakage. Based on this insight, we propose the Reminiscence Attack (ReA), which amplifies the correlation between residuals and membership privacy through targeted fine-tuning processes. ReA achieves up to 1. 90 and 1.12 higher accuracy than prior attacks when inferring class-wise and sample-wise membership, respectively. T o mitigate such residual-induced privacy risk, we develop a dual-phase approximate unlearning framework that first eliminates deep-layer unlearned data traces and then enforces convergence stability to prevent models from "pseudo-convergence", where their outputs are similar to retrained models but still preserve unlearned residuals. Our framework works for both classification and generation tasks. Experimental evaluations confirm that our approach maintains high unlearning efficacy, while reducing the adaptive privacy attack accuracy to nearly random guess, at the computational cost of 2 12% of full retraining from scratch.


Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning

Mo, Wenjin, Li, Zhiyuan, Fang, Minghong, Fang, Mingwei

arXiv.org Artificial Intelligence

Federated learning (FL) allows multiple clients to collaboratively train a global machine learning model with coordination from a central server, without needing to share their raw data. This approach is particularly appealing in the era of privacy regulations like the GDPR, leading many prominent companies to adopt it. However, FL's distributed nature makes it susceptible to poisoning attacks, where malicious clients, controlled by an attacker, send harmful data to compromise the model. Most existing poisoning attacks in FL aim to degrade the model's integrity, such as reducing its accuracy, with limited attention to privacy concerns from these attacks. In this study, we introduce FedPoisonMIA, a novel poisoning membership inference attack targeting FL. FedPoisonMIA involves malicious clients crafting local model updates to infer membership information. Additionally, we propose a robust defense mechanism to mitigate the impact of FedPoisonMIA attacks. Extensive experiments across various datasets demonstrate the attack's effectiveness, while our defense approach reduces its impact to a degree.