Increasing the Cost of Model Extraction with Calibrated Proof of Work
Dziedzic, Adam, Kaleem, Muhammad Ahmad, Lu, Yu Shen, Papernot, Nicolas
–arXiv.org Artificial Intelligence
In model extraction attacks, adversaries can steal a machine learning model exposed via a public API by repeatedly querying it and adjusting their own model based on obtained predictions. To prevent model stealing, existing defenses focus on detecting malicious queries, truncating, or distorting outputs, thus necessarily introducing a tradeoff between robustness and model utility for legitimate users. Instead, we propose to impede model extraction by requiring users to complete a proof-of-work before they can read the model's predictions. This deters attackers by greatly increasing (even up to 100x) the computational effort needed to leverage query access for model extraction. Since we calibrate the effort required to complete the proof-of-work to each query, this only introduces a slight overhead for regular users (up to 2x). To achieve this, our calibration applies tools from differential privacy to measure the information revealed by a query. Our method requires no modification of the victim model and can be applied by machine learning practitioners to guard their publicly exposed models against being easily stolen. Model extraction attacks (Tramèr et al., 2016; Jagielski et al., 2020; Zanella-Beguelin et al., 2021) are a threat to the confidentiality of machine learning (ML) models. They are also used as reconnaissance prior to mounting other attacks, for example, if an adversary wishes to disguise some spam message to get it past a target spam filter (Lowd & Meek, 2005), or generate adversarial examples (Biggio et al., 2013; Szegedy et al., 2014) using the extracted model (Papernot et al., 2017b). Furthermore, an adversary can extract a functionally similar model even without access to any real input training data (Krishna et al., 2020; Truong et al., 2021; Miura et al., 2021) while bypassing the long and expensive process of data procuring, cleaning, and preprocessing. This harms the interests of the model owner and infringes on their intellectual property. Defenses against model extraction can be categorized as active, passive, or reactive. Passive defenses try to detect an attack (Juuti et al., 2019) or truncate outputs (Tramèr et al., 2016), but these methods lower the quality of results for legitimate users. The main reactive defenses against model extraction attacks are watermarking (Jia et al., 2020b), dataset inference (Maini et al., 2021), and proof of learning (Jia et al., 2021). However, reactive approaches address model extraction post hoc, i.e., after the attack has been completed. We design a pro-active defense that prevents model stealing before it succeeds. Specifically, we aim to increase the computational cost of model extraction without lowering the quality of model outputs. Our method is based on the concept of proof-of-work (PoW) and its main steps are presented as a block diagram in Figure 1.
arXiv.org Artificial Intelligence
Jan-23-2022
- Country:
- North America
- Canada > Ontario
- Toronto (0.14)
- United States (0.67)
- Canada > Ontario
- North America
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: