Goto

Collaborating Authors

 Shen, Xuemin


MobiLLM: Enabling LLM Fine-Tuning on the Mobile Device via Server Assisted Side Tuning

arXiv.org Artificial Intelligence

Large Language Model (LLM) at mobile devices and its potential applications never fail to fascinate. However, on-device LLM fine-tuning poses great challenges due to extremely high memory requirements and slow training speeds. Even with parameter-efficient fine-tuning (PEFT) methods that update only a small subset of parameters, resource-constrained mobile devices cannot afford them. In this paper, we propose MobiLLM to enable memory-efficient transformer LLM fine-tuning on a mobile device via server-assisted side-tuning. Particularly, MobiLLM allows the resource-constrained mobile device to retain merely a frozen backbone model, while offloading the memory and computation-intensive backpropagation of a trainable side-network to a high-performance server. Unlike existing fine-tuning methods that keep trainable parameters inside the frozen backbone, MobiLLM separates a set of parallel adapters from the backbone to create a backpropagation bypass, involving only one-way activation transfers from the mobile device to the server with low-width quantization during forward propagation. In this way, the data never leaves the mobile device while the device can remove backpropagation through the local backbone model and its forward propagation can be paralyzed with the server-side execution. Thus, MobiLLM preserves data privacy while significantly reducing the memory and computational burdens for LLM fine-tuning. Through extensive experiments, we demonstrate that MobiLLM can enable a resource-constrained mobile device, even a CPU-only one, to fine-tune LLMs and significantly reduce convergence time and memory usage.


Deploying Foundation Model Powered Agent Services: A Survey

arXiv.org Artificial Intelligence

Foundation model (FM) powered agent services are regarded as a promising solution to develop intelligent and personalized applications for advancing toward Artificial General Intelligence (AGI). To achieve high reliability and scalability in deploying these agent services, it is essential to collaboratively optimize computational and communication resources, thereby ensuring effective resource allocation and seamless service delivery. In pursuit of this vision, this paper proposes a unified framework aimed at providing a comprehensive survey on deploying FM-based agent services across heterogeneous devices, with the emphasis on the integration of model and resource optimization to establish a robust infrastructure for these services. Particularly, this paper begins with exploring various low-level optimization strategies during inference and studies approaches that enhance system scalability, such as parallelism techniques and resource scaling methods. The paper then discusses several prominent FMs and investigates research efforts focused on inference acceleration, including techniques such as model compression and token reduction. Moreover, the paper also investigates critical components for constructing agent services and highlights notable intelligent applications. Finally, the paper presents potential research directions for developing real-time agent services with high Quality of Service (QoS).


User-centric Immersive Communications in 6G: A Data-oriented Approach via Digital Twin

arXiv.org Artificial Intelligence

In this article, we present a novel user-centric service provision for immersive communications (IC) in 6G to deal with the uncertainty of individual user behaviors while satisfying unique requirements on the quality of multi-sensory experience. To this end, we propose a data-oriented approach for network resource management, featuring personalized data management that can support network modeling tailored to different user demands. Our approach leverages the digital twin (DT) technique as a key enabler. Particularly, a DT is established for each user, and the data attributes in the DT are customized based on the characteristics of the user. The DT functions, corresponding to various data operations, are customized in the development, evaluation, and update of network models to meet unique user demands. A trace-driven case study demonstrates the effectiveness of our approach in achieving user-centric IC and the significance of personalized data management in 6G.


Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks

arXiv.org Artificial Intelligence

Semantic Communication (SemCom) plays a pivotal role in 6G networks, offering a viable solution for future efficient communication. Deep Learning (DL)-based semantic codecs further enhance this efficiency. However, the vulnerability of DL models to security threats, such as adversarial attacks, poses significant challenges for practical applications of SemCom systems. These vulnerabilities enable attackers to tamper with messages and eavesdrop on private information, especially in wireless communication scenarios. Although existing defenses attempt to address specific threats, they often fail to simultaneously handle multiple heterogeneous attacks. To overcome this limitation, we introduce a novel Mixture-of-Experts (MoE)-based SemCom system. This system comprises a gating network and multiple experts, each specializing in different security challenges. The gating network adaptively selects suitable experts to counter heterogeneous attacks based on user-defined security requirements. Multiple experts collaborate to accomplish semantic communication tasks while meeting the security requirements of users. A case study in vehicular networks demonstrates the efficacy of the MoE-based SemCom system. Simulation results show that the proposed MoE-based SemCom system effectively mitigates concurrent heterogeneous attacks, with minimal impact on downstream task accuracy.


Toward Enhanced Reinforcement Learning-Based Resource Management via Digital Twin: Opportunities, Applications, and Challenges

arXiv.org Artificial Intelligence

This article presents a digital twin (DT)-enhanced reinforcement learning (RL) framework aimed at optimizing performance and reliability in network resource management, since the traditional RL methods face several unified challenges when applied to physical networks, including limited exploration efficiency, slow convergence, poor long-term performance, and safety concerns during the exploration phase. To deal with the above challenges, a comprehensive DT-based framework is proposed to enhance the convergence speed and performance for unified RL-based resource management. The proposed framework provides safe action exploration, more accurate estimates of long-term returns, faster training convergence, higher convergence performance, and real-time adaptation to varying network conditions. Then, two case studies on ultra-reliable and low-latency communication (URLLC) services and multiple unmanned aerial vehicles (UAV) network are presented, demonstrating improvements of the proposed framework in performance, convergence speed, and training cost reduction both on traditional RL and neural network based Deep RL (DRL). Finally, the article identifies and explores some of the research challenges and open issues in this rapidly evolving field.


Filling the Missing: Exploring Generative AI for Enhanced Federated Learning over Heterogeneous Mobile Edge Devices

arXiv.org Artificial Intelligence

Distributed Artificial Intelligence (AI) model training over mobile edge networks encounters significant challenges due to the data and resource heterogeneity of edge devices. The former hampers the convergence rate of the global model, while the latter diminishes the devices' resource utilization efficiency. In this paper, we propose a generative AI-empowered federated learning to address these challenges by leveraging the idea of FIlling the MIssing (FIMI) portion of local data. Specifically, FIMI can be considered as a resource-aware data augmentation method that effectively mitigates the data heterogeneity while ensuring efficient FL training. We first quantify the relationship between the training data amount and the learning performance. We then study the FIMI optimization problem with the objective of minimizing the device-side overall energy consumption subject to required learning performance constraints. The decomposition-based analysis and the cross-entropy searching method are leveraged to derive the solution, where each device is assigned suitable AI-synthesized data and resource utilization policy. Experiment results demonstrate that FIMI can save up to 50% of the device-side energy to achieve the target global test accuracy in comparison with the existing methods. Meanwhile, FIMI can significantly enhance the converged global accuracy under the non-independently-and-identically distribution (non-IID) data.


Scalable Resource Management for Dynamic MEC: An Unsupervised Link-Output Graph Neural Network Approach

arXiv.org Artificial Intelligence

Deep learning has been successfully adopted in mobile edge computing (MEC) to optimize task offloading and resource allocation. However, the dynamics of edge networks raise two challenges in neural network (NN)-based optimization methods: low scalability and high training costs. Although conventional node-output graph neural networks (GNN) can extract features of edge nodes when the network scales, they fail to handle a new scalability issue whereas the dimension of the decision space may change as the network scales. To address the issue, in this paper, a novel link-output GNN (LOGNN)-based resource management approach is proposed to flexibly optimize the resource allocation in MEC for an arbitrary number of edge nodes with extremely low algorithm inference delay. Moreover, a label-free unsupervised method is applied to train the LOGNN efficiently, where the gradient of edge tasks processing delay with respect to the LOGNN parameters is derived explicitly. In addition, a theoretical analysis of the scalability of the node-output GNN and link-output GNN is performed. Simulation results show that the proposed LOGNN can efficiently optimize the MEC resource allocation problem in a scalable way, with an arbitrary number of servers and users. In addition, the proposed unsupervised training method has better convergence performance and speed than supervised learning and reinforcement learning-based training methods. The code is available at \url{https://github.com/UNIC-Lab/LOGNN}.


Digital Twin-Based 3D Map Management for Edge-Assisted Mobile Augmented Reality

arXiv.org Artificial Intelligence

In this paper, we design a 3D map management scheme for edge-assisted mobile augmented reality (MAR) to support the pose estimation of individual MAR device, which uploads camera frames to an edge server. Our objective is to minimize the pose estimation uncertainty of the MAR device by periodically selecting a proper set of camera frames for uploading to update the 3D map. To address the challenges of the dynamic uplink data rate and the time-varying pose of the MAR device, we propose a digital twin (DT)-based approach to 3D map management. First, a DT is created for the MAR device, which emulates 3D map management based on predicting subsequent camera frames. Second, a model-based reinforcement learning (MBRL) algorithm is developed, utilizing the data collected from both the actual and the emulated data to manage the 3D map. With extensive emulated data provided by the DT, the MBRL algorithm can quickly provide an adaptive map management policy in a highly dynamic environment. Simulation results demonstrate that the proposed DT-based 3D map management outperforms benchmark schemes by achieving lower pose estimation uncertainty and higher data efficiency in dynamic environments.


Semantic Information Marketing in The Metaverse: A Learning-Based Contract Theory Framework

arXiv.org Artificial Intelligence

In this paper, we address the problem of designing incentive mechanisms by a virtual service provider (VSP) to hire sensing IoT devices to sell their sensing data to help creating and rendering the digital copy of the physical world in the Metaverse. Due to the limited bandwidth, we propose to use semantic extraction algorithms to reduce the delivered data by the sensing IoT devices. Nevertheless, mechanisms to hire sensing IoT devices to share their data with the VSP and then deliver the constructed digital twin to the Metaverse users are vulnerable to adverse selection problem. The adverse selection problem, which is caused by information asymmetry between the system entities, becomes harder to solve when the private information of the different entities are multi-dimensional. We propose a novel iterative contract design and use a new variant of multi-agent reinforcement learning (MARL) to solve the modelled multi-dimensional contract problem. To demonstrate the effectiveness of our algorithm, we conduct extensive simulations and measure several key performance metrics of the contract for the Metaverse. Our results show that our designed iterative contract is able to incentivize the participants to interact truthfully, which maximizes the profit of the VSP with minimal individual rationality (IR) and incentive compatibility (IC) violation rates. Furthermore, the proposed learning-based iterative contract framework has limited access to the private information of the participants, which is to the best of our knowledge, the first of its kind in addressing the problem of adverse selection in incentive mechanisms.


Short-term Road Traffic Prediction based on Deep Cluster at Large-scale Networks

arXiv.org Machine Learning

Short-term road traffic prediction (STTP) is one of the most important modules in Intelligent Transportation Systems (ITS). However, network-level STTP still remains challenging due to the difficulties both in modeling the diverse traffic patterns and tacking high-dimensional time series with low latency. Therefore, a framework combining with a deep clustering (DeepCluster) module is developed for STTP at largescale networks in this paper. The DeepCluster module is proposed to supervise the representation learning in a visualized way from the large unlabeled dataset. More specifically, to fully exploit the traffic periodicity, the raw series is first split into a number of sub-series for triplets generation. The convolutional neural networks (CNNs) with triplet loss are utilized to extract the features of shape by transferring the series into visual images. The shape-based representations are then used for road segments clustering. Thereafter, motivated by the fact that the road segments in a group have similar patterns, a model sharing strategy is further proposed to build recurrent NNs (RNNs)-based predictions through a group-based model (GM), instead of individual-based model (IM) in which one model are built for one road exclusively. Our framework can not only significantly reduce the number of models and cost, but also increase the number of training data and the diversity of samples. In the end, we evaluate the proposed framework over the network of Liuli Bridge in Beijing. Experimental results show that the DeepCluster can effectively cluster the road segments and GM can achieve comparable performance against the IM with less number of models.