Duan, Lingyu
Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems
Xia, Song, Yu, Yi, Yang, Wenhan, Ding, Meiwen, Chen, Zhuo, Duan, Lingyu, Kot, Alex C., Jiang, Xudong
By locally encoding raw data into intermediate features, collaborative inference enables end users to leverage powerful deep learning models without exposure of sensitive raw data to cloud servers. However, recent studies have revealed that these intermediate features may not sufficiently preserve privacy, as information can be leaked and raw data can be reconstructed via model inversion attacks (MIAs). Obfuscation-based methods, such as noise corruption, adversarial representation learning, and information filters, enhance the inversion robustness by obfuscating the task-irrelevant redundancy empirically. However, methods for quantifying such redundancy remain elusive, and the explicit mathematical relation between this redundancy minimization and inversion robustness enhancement has not yet been established. To address that, this work first theoretically proves that the conditional entropy of inputs given intermediate features provides a guaranteed lower bound on the reconstruction mean square error (MSE) under any MIA. Then, we derive a differentiable and solvable measure for bounding this conditional entropy based on the Gaussian mixture estimation and propose a conditional entropy maximization (CEM) algorithm to enhance the inversion robustness. Experimental results on four datasets demonstrate the effectiveness and adaptability of our proposed CEM; without compromising feature utility and computing efficiency, plugging the proposed CEM into obfuscation-based defense mechanisms consistently boosts their inversion robustness, achieving average gains ranging from 12.9\% to 48.2\%. Code is available at \href{https://github.com/xiasong0501/CEM}{https://github.com/xiasong0501/CEM}.
Transferable Adversarial Attacks on SAM and Its Downstream Models
Xia, Song, Yang, Wenhan, Yu, Yi, Lin, Xun, Ding, Henghui, Duan, Lingyu, Jiang, Xudong
The utilization of large foundational models has a dilemma: while fine-tuning downstream tasks from them holds promise for making use of the well-generalized knowledge in practical applications, their open accessibility also poses threats of adverse usage. This paper, for the first time, explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM), by solely utilizing the information from the open-sourced SAM. In contrast to prevailing transfer-based adversarial attacks, we demonstrate the existence of adversarial dangers even without accessing the downstream task and dataset to train a similar surrogate model. To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm to extract the intrinsic vulnerability inherent in the foundation model, which is then utilized as the prior knowledge to guide the generation of adversarial perturbations. Moreover, by formulating the gradient difference in the attacking process between the open-sourced SAM and its fine-tuned downstream models, we theoretically demonstrate that a deviation occurs in the adversarial update direction by directly maximizing the distance of encoded feature embeddings in the open-sourced SAM. Consequently, we propose a gradient robust loss that simulates the associated uncertainty with gradient-based noise augmentation to enhance the robustness of generated adversarial examples (AEs) towards this deviation, thus improving the transferability. Extensive experiments demonstrate the effectiveness of the proposed universal meta-initialized and gradient robust adversarial attack (UMI-GRAT) toward SAMs and their downstream models.
Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing
Chen, Zhuo, Lin, Weisi, Wang, Shiqi, Duan, Lingyu, Kot, Alex C.
Abstract--The recent advances of hardware technology have made the intelligent analysis equipped at the front-end with deep learning more prevailing and practical. To better enable the intelligent sensing at the front-end, instead of compressing and transmitting visual signals or the ultimately utilized toplayer deep learning features, we propose to compactly represent and convey the intermediate-layer deep learning features of high generalization capability, to facilitate the collaborating approach between front and cloud ends. This strategy enables a good balance among the computational load, transmission load and the generalization ability for cloud servers when deploying the deep neural networks for large scale cloud based visual analysis. Moreover, the presented strategy also makes the standardization of deep feature coding more feasible and promising, as a series of tasks can simultaneously benefit from the transmitted intermediate layers. We also present the results for evaluation of lossless deep feature compression with four benchmark data compression methods, which provides meaningful investigations and baselines for future research and standardization activities. ECENTLY, deep neural networks (DNNs) have demonstrated the state-of-the-art performance in various computer vision tasks, e.g., image classification [1], [2], [3], [4], image object detection [5], [6], visual tracking [7], visual retrieval [8]. In contrast to the handcrafted features such as Scale-Invariant Feature Transform (SIFT) [9], deep learning based approaches are able to learn representative features directly from the vast amounts of data. For image classification, which is the fundamental task of computer vision, the AlexNet model [1] has achieved 9% better classification accuracy than the previous handcrafted methods in the 2012 ImageNet competition [10], which provides a large scale training dataset with 1.2 million images and one thousand categories. Inspired by the fantastic progress of AlexNet, DNN models continue to be the undisputed leaders in the competition of ImageNet. In particular, both VGGNet [2] and GoogLeNet [11] announced promising performance in the ILSVRC 2014 classification challenge, which demonstrated that deeper and wider architectures can bring great benefits in learning better representations via large scale datasets. In 2016, He et al. also proposed residual blocks to enable very deep learning structure [3]. With the advances of network infrastructure, cloud-based applications are springing up in recent years. In particular, the front-end devices acquire information from users or the physical world, which are subsequently transmitted to the cloud end (i.e., data center) for further process and analyses.