Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy

Mei, Jian-Ping, Zhang, Weibin, Chen, Jie, Zhang, Xuyun, Zhu, Tiantian

Mar-16-2025–arXiv.org Artificial Intelligence

Malicious users attempt to replicate commercial models functionally at low cost by training a clone model with query responses. It is challenging to timely prevent such model-stealing attacks to achieve strong protection and maintain utility. In this paper, we propose a novel non-parametric detector called Account-aware Distribution Discrepancy (ADD) to recognize queries from malicious users by leveraging account-wise local dependency. We formulate each class as a Multivariate Normal distribution (MVN) in the feature space and measure the malicious score as the sum of weighted class-wise distribution discrepancy. The ADD detector is combined with random-based prediction poisoning to yield a plug-and-play defense module named D-ADD for image classification models. Results of extensive experimental studies show that D-ADD achieves strong defense against different types of attacks with little interference in serving benign users for both soft and hard-label settings.

artificial intelligence, machine learning, query, (18 more...)

arXiv.org Artificial Intelligence

Mar-16-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.14)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Information Technology > Security & Privacy (0.70)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.93)
  - Performance Analysis > Accuracy (0.68)