Goto

Collaborating Authors

 model pruning


Advancing Model Pruning via Bi-level Optimization

Neural Information Processing Systems

As illustrated by the Lottery Ticket Hypothesis (LTH), pruning also has the potential of improving their generalization ability. At the core of LTH, iterative magnitude pruning (IMP) is the predominant pruning method to successfully find'winning tickets'. Yet, the computation cost of IMP grows prohibitively as the targeted pruning ratio increases. To reduce the computation overhead, various efficient'one-shot' pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP. This raises the question of how to close the gap between pruning accuracy and pruning efficiency? To tackle it, we pursue the algorithmic advancement of model pruning.


Closing the Generalization Gap in Parameter-efficient Federated Edge Learning

Du, Xinnong, Lyu, Zhonghao, Cao, Xiaowen, Wen, Chunyang, Cui, Shuguang, Xu, Jie

arXiv.org Artificial Intelligence

Federated edge learning (FEEL) provides a promising foundation for edge artificial intelligence (AI) by enabling collaborative model training while preserving data privacy. However, limited and heterogeneous local datasets, as well as resource-constrained deployment, severely degrade both model generalization and resource utilization, leading to a compromised learning performance. Therefore, we propose a parameter-efficient FEEL framework that jointly leverages model pruning and client selection to tackle such challenges. First, we derive an information-theoretic generalization statement that characterizes the discrepancy between training and testing function losses and embed it into the convergence analysis. It reveals that a larger local generalization statement can undermine the global convergence. Then, we formulate a generalization-aware average squared gradient norm bound minimization problem, by jointly optimizing the pruning ratios, client selection, and communication-computation resources under energy and delay constraints. Despite its non-convexity, the resulting mixed-integer problem is efficiently solved via an alternating optimization algorithm. Extensive experiments demonstrate that the proposed design achieves superior learning performance than state-of-the-art baselines, validating the effectiveness of coupling generalization-aware analysis with system-level optimization for efficient FEEL.



Computation- and Communication-Efficient Online FL for Resource-Constrained Aerial Vehicles

Pervej, Ferdous, Jin, Richeng, Chowdhury, Md Moin Uddin, Singh, Simran, Güvenç, İsmail, Dai, Huaiyu

arXiv.org Artificial Intelligence

Privacy-preserving distributed machine learning (ML) and aerial connected vehicle (ACV)-assisted edge computing have drawn significant attention lately. Since the onboard sensors of ACVs can capture new data as they move along their trajectories, the continual arrival of such 'newly' sensed data leads to online learning and demands carefully crafting the trajectories. Besides, as typical ACVs are inherently resource-constrained, computation- and communication-efficient ML solutions are needed. Therefore, we propose a computation- and communication-efficient online aerial federated learning (2CEOAFL) algorithm to take the benefits of continual sensed data and limited onboard resources of the ACVs. In particular, considering independently owned ACVs act as selfish data collectors, we first model their trajectories according to their respective time-varying data distributions. We then propose a 2CEOAFL algorithm that allows the flying ACVs to (a) prune the received dense ML model to make it shallow, (b) train the pruned model, and (c) probabilistically quantize and offload their trained accumulated gradients to the central server (CS). Our extensive simulation results show that the proposed 2CEOAFL algorithm delivers comparable performances to its non-pruned and nonquantized, hence, computation- and communication-inefficient counterparts.


Advancing Model Pruning via Bi-level Optimization

Neural Information Processing Systems

As illustrated by the Lottery Ticket Hypothesis (L TH), pruning also has the potential of improving their generalization ability. At the core of L TH, iterative magnitude pruning (IMP) is the predominant pruning method to successfully find'winning tickets'. Y et, the computation cost of IMP grows prohibitively as the targeted pruning ratio increases. To reduce the computation overhead, various efficient'one-shot' pruning methods have been developed but these schemes are usually


Energy-Efficient Federated Learning for Edge Real-Time Vision via Joint Data, Computation, and Communication Design

Hou, Xiangwang, Wang, Jingjing, Guan, Fangming, Du, Jun, Jiang, Chunxiao, Ren, Yong

arXiv.org Artificial Intelligence

--Emerging real-time computer vision (CV) applications on wireless edge devices demand energy-efficient and privacy-preserving learning. Federated learning (FL) enables on-device training without raw data sharing, yet remains challenging in resource-constrained environments due to energy-intensive computation and communication, as well as limited and non-i.i.d. We propose FedDPQ, an ultra energy-efficient FL framework for real-time CV over unreliable wireless networks. FedDPQ integrates diffusion-based data augmentation, model pruning, communication quantization, and transmission power control to enhance training efficiency. It expands local datasets using synthetic data, reduces computation through pruning, compresses updates via quantization, and mitigates transmission outages with adaptive power control. We further derive a closed-form energy-convergence model capturing the coupled impact of these components, and develop a Bayesian optimization(BO)- based algorithm to jointly tune data augmentation strategy, pruning ratio, quantization level, and power control. This work of Xiangwang Hou was supported by the National Natural Science Foundation of China under grant No. 623B2060. This work of Jingjing Wang was partly supported by the National Natural Science Foundation of China under Grant No. 62222101 and No. U24A20213, partly supported by the Beijing Natural Science Foundation under Grants No. L232043 and No. L222039, partly supported by the Natural Science Foundation of Zhejiang Province under Grant No. LMS25F010007 and partly supported by the Fundamental Research Funds for the Central Universities. This work of Jun Du was partly supported by the National Natural Science Foundation China under Grants No. 62422109 and No.U23A20281.


Lightweight Federated Learning over Wireless Edge Networks

Hou, Xiangwang, Wang, Jingjing, Du, Jun, Jiang, Chunxiao, Ren, Yong, Niyato, Dusit

arXiv.org Artificial Intelligence

With the exponential growth of smart devices connected to wireless networks, data production is increasing rapidly, requiring machine learning (ML) techniques to unlock its value. However, the centralized ML paradigm raises concerns over communication overhead and privacy. Federated learning (FL) offers an alternative at the network edge, but practical deployment in wireless networks remains challenging. This paper proposes a lightweight FL (LTFL) framework integrating wireless transmission power control, model pruning, and gradient quantization. We derive a closed-form expression of the FL convergence gap, considering transmission error, model pruning error, and gradient quantization error. Based on these insights, we formulate an optimization problem to minimize the convergence gap while meeting delay and energy constraints. To solve the non-convex problem efficiently, we derive closed-form solutions for the optimal model pruning ratio and gradient quantization level, and employ Bayesian optimization for transmission power control. Extensive experiments on real-world datasets show that LTFL outperforms state-of-the-art schemes.


The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks

Lyu, Zhonghao, Xiao, Ming, Xu, Jie, Skoglund, Mikael, Di Renzo, Marco

arXiv.org Artificial Intelligence

The growing demand for large artificial intelligence model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications. In particular, edge-device co-inference, which partitions LAIMs between edge devices and servers, has emerged as a promising strategy for resource-efficient LAIM execution in wireless networks. In this paper, we investigate a pruning-aware LAIM co-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment. For analysis, we first prove that the LAIM output distortion is upper bounded by its parameter distortion. Then, we derive a lower bound on parameter distortion via rate-distortion theory, analytically capturing the relationship between pruning ratio and co-inference performance. Next, based on the analytical results, we formulate an LAIM co-inference distortion bound minimization problem by jointly optimizing the pruning ratio, transmit power, and computation frequency under system latency, energy, and available resource constraints. Moreover, we propose an efficient algorithm to tackle the considered highly non-convex problem. Finally, extensive simulations demonstrate the effectiveness of the proposed design. In particular, model parameter distortion is shown to provide a reliable bound on output distortion. Also, the proposed joint pruning ratio and resource management design achieves superior performance in balancing trade-offs among inference performance, system latency, and energy consumption compared with benchmark schemes, such as fully on-device and on-server inference. Moreover, the split point is shown to play a critical role in system performance optimization under heterogeneous and resource-limited edge environments.


Sponge Attacks on Sensing AI: Energy-Latency Vulnerabilities and Defense via Model Pruning

Hasan, Syed Mhamudul, Zangoti, Hussein, Anagnostopoulos, Iraklis, Shahid, Abdur R.

arXiv.org Artificial Intelligence

Recent studies have shown that sponge attacks can significantly increase the energy consumption and inference latency of deep neural networks (DNNs). However, prior work has focused primarily on computer vision and natural language processing tasks, overlooking the growing use of lightweight AI models in sensing-based applications on resource-constrained devices, such as those in Internet of Things (IoT) environments. These attacks pose serious threats of energy depletion and latency degradation in systems where limited battery capacity and real-time responsiveness are critical for reliable operation. This paper makes two key contributions. First, we present the first systematic exploration of energy-latency sponge attacks targeting sensing-based AI models. Using wearable sensing-based AI as a case study, we demonstrate that sponge attacks can substantially degrade performance by increasing energy consumption, leading to faster battery drain, and by prolonging inference latency. Second, to mitigate such attacks, we investigate model pruning, a widely adopted compression technique for resource-constrained AI, as a potential defense. Our experiments show that pruning-induced sparsity significantly improves model resilience against sponge poisoning. We also quantify the trade-offs between model efficiency and attack resilience, offering insights into the security implications of model compression in sensing-based AI systems deployed in IoT environments.


Uncovering Capabilities of Model Pruning in Graph Contrastive Learning

Wu, Junran, Chen, Xueyuan, Li, Shangzhe

arXiv.org Artificial Intelligence

Graph contrastive learning has achieved great success in pre-training graph neural networks without ground-truth labels. Leading graph contrastive learning follows the classical scheme of contrastive learning, forcing model to identify the essential information from augmented views. However, general augmented views are produced via random corruption or learning, which inevitably leads to semantics alteration. Although domain knowledge guided augmentations alleviate this issue, the generated views are domain specific and undermine the generalization. In this work, motivated by the firm representation ability of sparse model from pruning, we reformulate the problem of graph contrastive learning via contrasting different model versions rather than augmented views. We first theoretically reveal the superiority of model pruning in contrast to data augmentations. In practice, we take original graph as input and dynamically generate a perturbed graph encoder to contrast with the original encoder by pruning its transformation weights. Furthermore, considering the integrity of node embedding in our method, we are capable of developing a local contrastive loss to tackle the hard negative samples that disturb the model training. We extensively validate our method on various benchmarks regarding graph classification via unsupervised and transfer learning. Compared to the state-of-the-art (SOTA) works, better performance can always be obtained by the proposed method.