Goto

Collaborating Authors

 model extraction





Is the Hard-Label Cryptanalytic Model Extraction Really Polynomial?

arXiv.org Artificial Intelligence

Deep Neural Networks (DNNs) have attracted significant attention, and their internal models are now considered valuable intellectual assets. Extracting these internal models through access to a DNN is conceptually similar to extracting a secret key via oracle access to a block cipher. Consequently, cryptanalytic techniques, particularly differential-like attacks, have been actively explored recently. ReLU-based DNNs are the most commonly and widely deployed architectures. While early works (e.g., Crypto 2020, Eurocrypt 2024) assume access to exact output logits, which are usually invisible, more recent works (e.g., Asiacrypt 2024, Eurocrypt 2025) focus on the hard-label setting, where only the final classification result (e.g., "dog" or "car") is available to the attacker. Notably, Carlini et al. (Eurocrypt 2025) demonstrated that model extraction is feasible in polynomial time even under this restricted setting. In this paper, we first show that the assumptions underlying their attack become increasingly unrealistic as the attack-target depth grows. In practice, satisfying these assumptions requires an exponential number of queries with respect to the attack depth, implying that the attack does not always run in polynomial time. To address this critical limitation, we propose a novel attack method called CrossLayer Extraction. Instead of directly extracting the secret parameters (e.g., weights and biases) of a specific neuron, which incurs exponential cost, we exploit neuron interactions across layers to extract this information from deeper layers. This technique significantly reduces query complexity and mitigates the limitations of existing model extraction approaches.


Stabilizing Data-Free Model Extraction

arXiv.org Artificial Intelligence

Model extraction is a severe threat to Machine Learning-as-a-Service systems, especially through data-free approaches, where dishonest users can replicate the functionality of a black-box target model without access to realistic data. Despite recent advancements, existing data-free model extraction methods suffer from the oscillating accuracy of the substitute model. This oscillation, which could be attributed to the constant shift in the generated data distribution during the attack, makes the attack impractical since the optimal substitute model cannot be determined without access to the target model's in-distribution data. Hence, we propose MetaDFME, a novel data-free model extraction method that employs meta-learning in the generator training to reduce the distribution shift, aiming to mitigate the substitute model's accuracy oscillation. In detail, we train our generator to iteratively capture the meta-representations of the synthetic data during the attack. These meta-representations can be adapted with a few steps to produce data that facilitates the substitute model to learn from the target model while reducing the effect of distribution shifts. Our experiments on popular baseline image datasets, MNIST, SVHN, CIFAR-10, and CIFAR-100, demonstrate that MetaDFME outperforms the current state-of-the-art data-free model extraction method while exhibiting a more stable substitute model's accuracy during the attack.





CEGA: A Cost-Effective Approach for Graph-Based Model Extraction and Acquisition

arXiv.org Machine Learning

Graph Neural Networks (GNNs) have demonstrated remarkable utility across diverse applications, and their growing complexity has made Machine Learning as a Service (MLaaS) a viable platform for scalable deployment. However, this accessibility also exposes GNN to serious security threats, most notably model extraction attacks (MEAs), in which adversaries strategically query a deployed model to construct a high-fidelity replica. In this work, we evaluate the vulnerability of GNNs to MEAs and explore their potential for cost-effective model acquisition in non-adversarial research settings. Importantly, adaptive node querying strategies can also serve a critical role in research, particularly when labeling data is expensive or time-consuming. By selectively sampling informative nodes, researchers can train high-performing GNNs with minimal supervision, which is particularly valuable in domains such as biomedicine, where annotations often require expert input. To address this, we propose a node querying strategy tailored to a highly practical yet underexplored scenario, where bulk queries are prohibited, and only a limited set of initial nodes is available. Our approach iteratively refines the node selection mechanism over multiple learning cycles, leveraging historical feedback to improve extraction efficiency. Extensive experiments on benchmark graph datasets demonstrate our superiority over comparable baselines on accuracy, fidelity, and F1 score under strict query-size constraints. These results highlight both the susceptibility of deployed GNNs to extraction attacks and the promise of ethical, efficient GNN acquisition methods to support low-resource research environments.


Explore the vulnerability of black-box models via diffusion models

arXiv.org Artificial Intelligence

Recent advancements in diffusion models have enabled high-fidelity and photorealistic image generation across diverse applications. However, these models also present security and privacy risks, including copyright violations, sensitive information leakage, and the creation of harmful or offensive content that could be exploited maliciously. In this study, we uncover a novel security threat where an attacker leverages diffusion model APIs to generate synthetic images, which are then used to train a high-performing substitute model. This enables the attacker to execute model extraction and transfer-based adversarial attacks on black-box classification models with minimal queries, without needing access to the original training data. The generated images are sufficiently high-resolution and diverse to train a substitute model whose outputs closely match those of the target model. Across the seven benchmarks, including CIFAR and ImageNet subsets, our method shows an average improvement of 27.37% over state-of-the-art methods while using just 0.01 times of the query budget, achieving a 98.68% success rate in adversarial attacks on the target model.