Goto

Collaborating Authors

 collaborative inference




Reimagining Mutual Information for Enhanced Defense against Data Leakage in Collaborative Inference

Neural Information Processing Systems

Edge-cloud collaborative inference empowers resource-limited IoT devices to support deep learning applications without disclosing their raw data to the cloud server, thus protecting user's data. Nevertheless, prior research has shown that collaborative inference still results in the exposure of input and predictions from edge devices. To defend against such data leakage in collaborative inference, we introduce InfoScissors, a defense strategy designed to reduce the mutual information between a model's intermediate outcomes and the device's input and predictions. We evaluate our defense on several datasets in the context of diverse attacks. Besides the empirical comparison, we provide a theoretical analysis of the inadequacies of recent defense strategies that also utilize mutual information, particularly focusing on those based on the Variational Information Bottleneck (VIB) approach. We illustrate the superiority of our method and offer a theoretical analysis of it.


Posthoc privacy guarantees for collaborative inference with modified Propose-Test-Release

Neural Information Processing Systems

Cloud-based machine learning inference is an emerging paradigm where users query by sending their data through a service provider who runs an ML model on that data and returns back the answer. Due to increased concerns over data privacy, recent works have proposed Collaborative Inference (CI) to learn a privacy-preserving encoding of sensitive user data before it is shared with an untrusted service provider. Existing works so far evaluate the privacy of these encodings through empirical reconstruction attacks. In this work, we develop a new framework that provides formal privacy guarantees for an arbitrarily trained neural network by linking its local Lipschitz constant with its local sensitivity. To guarantee privacy using local sensitivity, we extend the Propose-Test-Release (PTR) framework to make it tractable for neural network queries. We verify the efficacy of our framework experimentally on real-world datasets and elucidate the role of Adversarial Representation Learning (ARL) in improving the privacy-utility trade-off.


Action Deviation-Aware Inference for Low-Latency Wireless Robots

Park, Jeyoung, Lim, Yeonsub, Oh, Seungeun, Park, Jihong, Choi, Jinho, Kim, Seong-Lyun

arXiv.org Artificial Intelligence

To support latency-sensitive AI applications ranging from autonomous driving to industrial robot manipulation, 6G envisions distributed ML with computational resources in mobile, edge, and cloud connected over hyper-reliable low-latency communication (HRLLC). In this setting, speculative decoding can facilitate collaborative inference of models distributively deployed: a lightweight on-device model locally generates drafts while a more capable remote target model on a server verifies and corrects them in parallel with speculative sampling, thus resulting in lower latency without compromising accuracy. However, unlike autoregressive text generation, behavior cloning policies, typically used for embodied AI applications, cannot parallelize verification and correction for multiple drafts as each generated action depends on observation updated by a previous action. To this end, we propose Action Deviation-Aware Hybrid Inference (ADAHI), wherein drafts are selectively transmitted and verified based on action deviation, which has a strong correlation with action's rejection probability by the target model. By invoking server operation only when necessary, communication and computational overhead can be reduced while accuracy gain from speculative sampling is preserved. Experiments on our testbed show that ADAHI reduces transmission and server operations by approximately 40%, lowers end-to-end latency by 39.2%, and attains up to 97.2% of the task-success rate of baseline that invokes speculative sampling for every draft embedding vector.




CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference

Xu, Guanyu, Hao, Zhiwei, Shen, Li, Luo, Yong, Sun, Fuhui, Wang, Xiaoyan, Hu, Han, Wen, Yonggang

arXiv.org Artificial Intelligence

--The impressive performance of transformer models has sparked the deployment of intelligent applications on resource-constrained edge devices. However, ensuring high-quality service for real-time edge systems is a significant challenge due to the considerable computational demands and resource requirements of these models. Existing strategies typically either offload transformer computations to other devices or directly deploy compressed models on individual edge devices. T o tackle these challenges, we propose a collaborative inference system for general transformer models, termed CoFormer . The central idea behind CoFormer is to exploit the divisibility and integrability of transformer . An off-the-shelf large transformer can be decomposed into multiple smaller models for distributed inference, and their intermediate results are aggregated to generate the final output. We formulate an optimization problem to minimize both inference latency and accuracy degradation under heterogeneous hardware constraints. DeBo algorithm is proposed to first solve the optimization problem to derive the decomposition policy, and then progressively calibrate decomposed models to restore performance. We demonstrate the capability to support a wide range of transformer models on heterogeneous edge devices, achieving up to 3.1 inference speedup with large transformer models. Notably, CoFormer enables the efficient inference of GPT2-XL with 1.6 billion parameters on edge devices, reducing memory requirements by 76.3%. CoFormer can also reduce energy consumption by approximately 40% while maintaining satisfactory inference performance. Guanyu Xu, Zhiwei Hao and Han Hu are with the School of Information and Electrionics, Beijing Institute of Technology, Beijing 100081, China. Li Shen is with the School of Cyber Science and Technology, Shen-zhen Campus of Sun Y at-sen University, Shenzhen 518107, China. Y ong Luo is with the School of Computer Science, National Engineering Research Center for Multimedia Software, Wuhan University, Wuhan 430072, China. Fuhui Sun and Xiaoyan Wang are with Information Technology Service Center of People's Court, Beijing, 100745, China. Y onggang Wen is with the College of Computing and Data Science, Nanyang Technological University, Singapore 639798. CoFormer significantly outperforms other methods. Specifically, CoFormer accelerates inference speed by 3.1 compared to Swin-L [4] with only 1.7% accuracy sacrifice.



Reimagining Mutual Information for Enhanced Defense against Data Leakage in Collaborative Inference

Neural Information Processing Systems

Edge-cloud collaborative inference empowers resource-limited IoT devices to support deep learning applications without disclosing their raw data to the cloud server, thus protecting user's data. Nevertheless, prior research has shown that collaborative inference still results in the exposure of input and predictions from edge devices. To defend against such data leakage in collaborative inference, we introduce InfoScissors, a defense strategy designed to reduce the mutual information between a model's intermediate outcomes and the device's input and predictions. We evaluate our defense on several datasets in the context of diverse attacks. Besides the empirical comparison, we provide a theoretical analysis of the inadequacies of recent defense strategies that also utilize mutual information, particularly focusing on those based on the Variational Information Bottleneck (VIB) approach. We illustrate the superiority of our method and offer a theoretical analysis of it.