Goto

Collaborating Authors

 communication overhead



Distributed Distillation for On-Device Learning

Neural Information Processing Systems

On-device learning promises collaborative training of machine learning models across edge devices without the sharing of user data. In state-of-the-art on-device learning algorithms, devices communicate their model weights over a decentralized communication network. Transmitting model weights requires huge communication overhead and means only devices with identical model architectures can be included. To overcome these limitations, we introduce a distributed distillation algorithm where devices communicate and learn from soft-decision (softmax) outputs, which are inherently architecture-agnostic and scale only with the number of classes. The communicated soft-decisions are each model's outputs on a public, unlabeled reference dataset, which serves as a common vocabulary between devices. We prove that our algorithm converges with probability 1 to a stationary point where all devices in the communication network distill the entire network's knowledge on the reference data, regardless of their local connections. Our analysis assumes smooth loss functions, which can be non-convex. Simulations support our theoretical findings and show that even a naive implementation of our algorithm significantly reduces the communication overhead while achieving an overall comparable performance to state-of-the-art, depending on the regime. By requiring little communication overhead and allowing for cross-architecture training, we remove two main obstacles to scaling on-device learning.


DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning

Neural Information Processing Systems

Sparse tensors appear frequently in federated deep learning, either as a direct artifact of the deep neural network's gradients, or as a result of an explicit sparsification process. Existing communication primitives are agnostic to the peculiarities of deep learning; consequently, they impose unnecessary communication overhead. This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors, tailored to federated deep learning. DeepReduce decomposes sparse tensors into two sets, values and indices, and allows both independent and combined compression of these sets. We support a variety of common compressors, such as Deflate for values, or run-length encoding for indices. We also propose two novel compression schemes that achieve superior results: curve fitting-based for values, and bloom filter-based for indices. DeepReduce is orthogonal to existing gradient sparsifiers and can be applied in conjunction with them, transparently to the end-user, to significantly lower the communication overhead. As proof of concept, we implement our approach on TensorFlow and PyTorch. Our experiments with large real models demonstrate that DeepReduce transmits 320% less data than existing sparsifiers, without affecting accuracy.


Federated Distillation Assisted Vehicle Edge Caching Scheme Based on Lightweight DDPM

Li, Xun, Wu, Qiong, Fan, Pingyi, Wang, Kezhi, Chen, Wen, Letaief, Khaled B.

arXiv.org Artificial Intelligence

Vehicle edge caching is a promising technology that can significantly reduce the latency for vehicle users (VUs) to access content by pre-caching user-interested content at edge nodes. It is crucial to accurately predict the content that VUs are interested in without exposing their privacy. Traditional federated learning (FL) can protect user privacy by sharing models rather than raw data. However, the training of FL requires frequent model transmission, which can result in significant communication overhead. Additionally, vehicles may leave the road side unit (RSU) coverage area before training is completed, leading to training failures. To address these issues, in this letter, we propose a federated distillation-assisted vehicle edge caching scheme based on lightweight denoising diffusion probabilistic model (LDPM). The simulation results demonstrate that the proposed vehicle edge caching scheme has good robustness to variations in vehicle speed, significantly reducing communication overhead and improving cache hit percentage.


Secure and Privacy-Preserving Federated Learning for Next-Generation Underground Mine Safety

Elmahallawy, Mohamed, Madria, Sanjay, Frimpong, Samuel

arXiv.org Artificial Intelligence

Underground mining operations depend on sensor networks to monitor critical parameters such as temperature, gas concentration, and miner movement, enabling timely hazard detection and safety decisions. However, transmitting raw sensor data to a centralized server for machine learning (ML) model training raises serious privacy and security concerns. Federated Learning (FL) offers a promising alternative by enabling decentralized model training without exposing sensitive local data. Yet, applying FL in underground mining presents unique challenges: (i) Adversaries may eavesdrop on shared model updates to launch model inversion or membership inference attacks, compromising data privacy and operational safety; (ii) Non-IID data distributions across mines and sensor noise can hinder model convergence. To address these issues, we propose FedMining--a privacy-preserving FL framework tailored for underground mining. FedMining introduces two core innovations: (1) a Decentralized Functional Encryption (DFE) scheme that keeps local models encrypted, thwarting unauthorized access and inference attacks; and (2) a balancing aggregation mechanism to mitigate data heterogeneity and enhance convergence. Evaluations on real-world mining datasets demonstrate FedMining's ability to safeguard privacy while maintaining high model accuracy and achieving rapid convergence with reduced communication and computation overhead. These advantages make FedMining both secure and practical for real-time underground safety monitoring.


PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance

Ayana, Jifar Wakuma, Qiming, Huang

arXiv.org Artificial Intelligence

Abstract--Large Language Models (LLMs) are emerging as powerful enablers for autonomous reasoning and natural-language coordination in unmanned aerial vehicle (UA V) swarms operating within Internet of Things (IoT) environments. However, existing LLM-driven UA V systems typically process sensor data, mission descriptions, and control outputs in plaintext, exposing sensitive operational information to privacy and security risks. This work introduces PrivLLMSwarm, a privacy-preserving framework that performs secure LLM inference for UA V swarm coordination through Secure Multi-Party Computation (MPC). The framework incorporates MPC-optimized transformer components, including efficient approximations of nonlinear activations and communication-aware attention mechanisms, enabling practical encrypted inference on resource-constrained aerial platforms. A fine-tuned GPT -based command generator, further enhanced through reinforcement learning in a realistic simulation environment, provides reliable natural-language instructions while maintaining end-to-end confidentiality. Experimental evaluation in an urban-scale simulation demonstrates that PrivLLMSwarm achieves high semantic accuracy, low encrypted inference latency, stable formation control, and robust obstacle-avoidance behavior under privacy constraints. Comparative analysis shows that PrivLLMSwarm offers a more favorable privacy-utility balance than differential privacy, federated learning, and plaintext baselines. PrivLLMSwarm establishes a practical foundation for secure, LLM-enabled UA V swarms in privacy-sensitive IoT applications including smart-city monitoring, emergency response, and critical infrastructure protection.


Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding

Koh, Jungyeon, Yang, Hyun Jong

arXiv.org Artificial Intelligence

The growing demand for on-device large language model (LLM) inference highlights the need for efficient mobile edge computing (MEC) solutions, especially in resource-constrained settings. Speculative decoding offers a promising solution by partitioning token generation between a lightweight draft model on mobile devices and a powerful target model on edge servers, but suffers from communication overhead and asynchronous delays. This paper is the first to propose a unified framework that jointly optimizes user association and resource allocation (UARA) to support efficient parallel speculative decoding. We solve the UARA problem using a multi-agent deep reinforcement learning algorithm. To evaluate our approach under realistic conditions, we conduct experiments using the Sionna simulator. Results show that our method achieves up to 28.0% and an average of 23.7% reduction in end-to-end latency without compromising inference accuracy, enabling scalable and low-latency LLM services in MEC systems.


FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning

Yao, Yuan, Wang, Lixu, Wu, Jiaqi, Song, Jin, Chen, Simin, Wang, Zehua, Tian, Zijian, Chen, Wei, Li, Huixia, Li, Xiaoxiao

arXiv.org Artificial Intelligence

Federated learning (FL) enables collaborative training across clients without compromising privacy. While most existing FL methods assume homogeneous model architectures, client heterogeneity in data and resources renders this assumption impractical, motivating model-heterogeneous FL. To address this problem, we propose Federated Representation Entanglement (FedRE), a framework built upon a novel form of client knowledge termed entangled representation. In FedRE, each client aggregates its local representations into a single entangled representation using normalized random weights and applies the same weights to integrate the corresponding one-hot label encodings into the entangled-label encoding. Those are then uploaded to the server to train a global classifier. During training, each entangled representation is supervised across categories via its entangled-label encoding, while random weights are resampled each round to introduce diversity, mitigating the global classifier's overconfidence and promoting smoother decision boundaries. Furthermore, each client uploads a single cross-category entangled representation along with its entangled-label encoding, mitigating the risk of representation inversion attacks and reducing communication overhead. Extensive experiments demonstrate that FedRE achieves an effective trade-off among model performance, privacy protection, and communication overhead. The codes are available at https://github.com/AIResearch-Group/FedRE.


FedAPA: Federated Learning with Adaptive Prototype Aggregation Toward Heterogeneous Wi-Fi CSI-based Crowd Counting

Guo, Jingtao, Mao, Yuyi, Ho, Ivan Wang-Hei

arXiv.org Artificial Intelligence

Wi-Fi channel state information (CSI)-based sensing provides a non-invasive, device-free approach for tasks such as human activity recognition and crowd counting, but large-scale deployment is hindered by the need for extensive site-specific training data. Federated learning (FL) offers a way to avoid raw data sharing but is challenged by heterogeneous sensing data and device resources. This paper proposes FedAPA, a collaborative Wi-Fi CSI-based sensing algorithm that uses adaptive prototype aggregation (APA) strategy to assign similarity-based weights to peer prototypes, enabling adaptive client contributions and yielding a personalized global prototype for each client instead of a fixed-weight aggregation. During local training, we adopt a hybrid objective that combines classification learning with representation contrastive learning to align local and global knowledge. We provide a convergence analysis of FedAPA and evaluate it in a real-world distributed Wi-Fi crowd counting scenario with six environments and up to 20 people. The results show that our method outperform multiple baselines in terms of accuracy, F1 score, mean absolute error (MAE), and communication overhead, with FedAPA achieving at least a 9.65% increase in accuracy, a 9% gain in F1 score, a 0.29 reduction in MAE, and a 95.94% reduction in communication overhead.


Panther: A Cost-Effective Privacy-Preserving Framework for GNN Training and Inference Services in Cloud Environments

Chen, Congcong, Liu, Xinyu, Huang, Kaifeng, Wei, Lifei, Shi, Yang

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) have marked significant impact in traffic state prediction, social recommendation, knowledge-aware question answering and so on. As more and more users move towards cloud computing, it has become a critical issue to unleash the power of GNNs while protecting the privacy in cloud environments. Specifically, the training data and inference data for GNNs need to be protected from being stolen by external adversaries. Meanwhile, the financial cost of cloud computing is another primary concern for users. Therefore, although existing studies have proposed privacy-preserving techniques for GNNs in cloud environments, their additional computational and communication overhead remain relatively high, causing high financial costs that limit their widespread adoption among users. To protect GNN privacy while lowering the additional financial costs, we introduce Panther, a cost-effective privacy-preserving framework for GNN training and inference services in cloud environments. Technically, Panther leverages four-party computation to asynchronously executing the secure array access protocol, and randomly pads the neighbor information of GNN nodes. We prove that Panther can protect privacy for both training and inference of GNN models. Our evaluation shows that Panther reduces the training and inference time by an average of 75.28% and 82.80%, respectively, and communication overhead by an average of 52.61% and 50.26% compared with the state-of-the-art, which is estimated to save an average of 55.05% and 59.00% in financial costs (based on on-demand pricing model) for the GNN training and inference process on Google Cloud Platform.