Tang, Xin
Satisfaction-Aware Incentive Scheme for Federated Learning in Industrial Metaverse: DRL-Based Stackbelberg Game Approach
Li, Xiaohuan, Qin, Shaowen, Tang, Xin, Kang, Jiawen, Ye, Jin, Zhao, Zhonghua, Niyato, Dusit
Industrial Metaverse leverages the Industrial Internet of Things (IIoT) to integrate data from diverse devices, employing federated learning and meta-computing to train models in a distributed manner while ensuring data privacy. Achieving an immersive experience for industrial Metaverse necessitates maintaining a balance between model quality and training latency. Consequently, a primary challenge in federated learning tasks is optimizing overall system performance by balancing model quality and training latency. This paper designs a satisfaction function that accounts for data size, Age of Information (AoI), and training latency. Additionally, the satisfaction function is incorporated into the utility functions to incentivize node participation in model training. We model the utility functions of servers and nodes as a two-stage Stackelberg game and employ a deep reinforcement learning approach to learn the Stackelberg equilibrium. This approach ensures balanced rewards and enhances the applicability of the incentive scheme for industrial Metaverse. Simulation results demonstrate that, under the same budget constraints, the proposed incentive scheme improves at least 23.7% utility compared to existing schemes without compromising model accuracy.
DNN Task Assignment in UAV Networks: A Generative AI Enhanced Multi-Agent Reinforcement Learning Approach
Tang, Xin, Chen, Qian, Weng, Wenjie, Liao, Binhan, Wang, Jiacheng, Cao, Xianbin, Li, Xiaohuan
Unmanned Aerial Vehicles (UAVs) possess high mobility and flexible deployment capabilities, prompting the development of UAVs for various application scenarios within the Internet of Things (IoT). The unique capabilities of UAVs give rise to increasingly critical and complex tasks in uncertain and potentially harsh environments. The substantial amount of data generated from these applications necessitates processing and analysis through deep neural networks (DNNs). However, UAVs encounter challenges due to their limited computing resources when managing DNN models. This paper presents a joint approach that combines multiple-agent reinforcement learning (MARL) and generative diffusion models (GDM) for assigning DNN tasks to a UAV swarm, aimed at reducing latency from task capture to result output. To address these challenges, we first consider the task size of the target area to be inspected and the shortest flying path as optimization constraints, employing a greedy algorithm to resolve the subproblem with a focus on minimizing the UAV's flying path and the overall system cost. In the second stage, we introduce a novel DNN task assignment algorithm, termed GDM-MADDPG, which utilizes the reverse denoising process of GDM to replace the actor network in multi-agent deep deterministic policy gradient (MADDPG). This approach generates specific DNN task assignment actions based on agents' observations in a dynamic environment. Simulation results indicate that our algorithm performs favorably compared to benchmarks in terms of path planning, Age of Information (AoI), energy consumption, and task load balancing.
Digital Twin-Empowered Task Assignment in Aerial MEC Network: A Resource Coalition Cooperation Approach with Generative Model
Tang, Xin, Chen, Qian, Yu, Rong, Li, Xiaohuan
To meet the demands for ubiquitous communication and temporary edge computing in 6G networks, aerial mobile edge computing (MEC) networks have been envisioned as a new paradigm. However, dynamic user requests pose challenges for task assignment strategies. Most of the existing research assumes that the strategy is deployed on ground-based stations or UAVs, which will be ineffective in an environment lacking infrastructure and continuous energy supply. Moreover, the resource mutual exclusion problem of dynamic task assignment has not been effectively solved. Toward this end, we introduce the digital twin (DT) into the aerial MEC network to study the resource coalition cooperation approach with the generative model (GM), which provides a preliminary coalition structure for the coalition game. Specifically, we propose a novel network framework that is composed of an application plane, a physical plane, and a virtual plane. After that, the task assignment problem is simplified to convex optimization programming with linear constraints. And then, we also propose a resource coalition cooperation approach that is based on a transferable utility (TU) coalition game to obtain an approximate optimal solution. Numerical results confirm the effectiveness of our proposed approach in terms of energy consumption and utilization of resources.
Filter Pruning via Filters Similarity in Consecutive Layers
Wang, Xiaorui, Wang, Jun, Tang, Xin, Gao, Peng, Fang, Rui, Xie, Guotong
Filter pruning is widely adopted to compress and accelerate the Convolutional Neural Networks (CNNs), but most previous works ignore the relationship between filters and channels in different layers. Processing each layer independently fails to utilize the collaborative relationship across layers. In this paper, we intuitively propose a novel pruning method by explicitly leveraging the Filters Similarity in Consecutive Layers (FSCL). FSCL compresses models by pruning filters whose corresponding features are more worthless in the model. The extensive experiments demonstrate the effectiveness of FSCL, and it yields remarkable improvement over state-of-the-art on accuracy, FLOPs and parameter reduction on several benchmark models and datasets.
Stability-based Generalization Analysis for Mixtures of Pointwise and Pairwise Learning
Wang, Jiahuan, Chen, Jun, Chen, Hong, Gu, Bin, Li, Weifu, Tang, Xin
Recently, some mixture algorithms of pointwise and pairwise learning (PPL) have been formulated by employing the hybrid error metric of "pointwise loss + pairwise loss" and have shown empirical effectiveness on feature selection, ranking and recommendation tasks. However, to the best of our knowledge, the learning theory foundation of PPL has not been touched in the existing works. In this paper, we try to fill this theoretical gap by investigating the generalization properties of PPL. After extending the definitions of algorithmic stability to the PPL setting, we establish the high-probability generalization bounds for uniformly stable PPL algorithms. Moreover, explicit convergence rates of stochastic gradient descent (SGD) and regularized risk minimization (RRM) for PPL are stated by developing the stability analysis technique of pairwise learning. In addition, the refined generalization bounds of PPL are obtained by replacing uniform stability with on-average stability.
PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stage Ranking
Qiao, Yixuan, Chen, Hao, Wang, Jun, Lai, Yongquan, Liu, Tuozhen, Ye, Xianbin, Tang, Xin, Fang, Rui, Gao, Peng, Xie, Wenfeng, Xie, Guotong
This paper describes the PASH participation in TREC 2021 Deep Learning Track. In the recall stage, we adopt a scheme combining sparse and dense retrieval method. In the multi-stage ranking phase, point-wise and pair-wise ranking strategies are used one after another based on model continual pre-trained on general knowledge and document-level data. Compared to TREC 2020 Deep Learning Track, we have additionally introduced the generative model T5 to further enhance the performance.
A Two-Stage Approach to Device-Robust Acoustic Scene Classification
Hu, Hu, Yang, Chao-Han Huck, Xia, Xianjun, Bai, Xue, Tang, Xin, Wang, Yajian, Niu, Shutong, Chai, Li, Li, Juanjuan, Zhu, Hongning, Bao, Feng, Zhao, Yuanjun, Siniscalchi, Sabato Marco, Wang, Yannan, Du, Jun, Lee, Chin-Hui
To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed. Our two-stage system leverages on an ad-hoc score combination based on two CNN classifiers: (i) the first CNN classifies acoustic inputs into one of three broad classes, and (ii) the second CNN classifies the same inputs into one of ten finer-grained classes. Three different CNN architectures are explored to implement the two-stage classifiers, and a frequency sub-sampling scheme is investigated. Moreover, novel data augmentation schemes for ASC are also investigated. Evaluated on DCASE 2020 Task 1a, our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set, where our best system, a two-stage fusion of CNN ensembles, delivers a 81.9% average accuracy among multi-device test data, and it obtains a significant improvement on unseen devices. Finally, neural saliency analysis with class activation mapping (CAM) gives new insights on the patterns learnt by our models.
Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition
Li, Bingcong, Tang, Xin, Qi, Xianbiao, Chen, Yihao, Xiao, Rong
Recently, inspired by Transformer, self-attention-based scene text recognition approaches have achieved outstanding performance. However, we find that the size of model expands rapidly with the lexicon increasing. Specifically, the number of parameters for softmax classification layer and output embedding layer are proportional to the vocabulary size. It hinders the development of a lightweight text recognition model especially applied for Chinese and multiple languages. Thus, we propose a lightweight scene text recognition model named Hamming OCR. In this model, a novel Hamming classifier, which adopts locality sensitive hashing (LSH) algorithm to encode each character, is proposed to replace the softmax regression and the generated LSH code is directly employed to replace the output embedding. We also present a simplified transformer decoder to reduce the number of parameters by removing the feed-forward network and using cross-layer parameter sharing technique. Compared with traditional methods, the number of parameters in both classification and embedding layers is independent on the size of vocabulary, which significantly reduces the storage requirement without loss of accuracy. Experimental results on several datasets, including four public benchmaks and a Chinese text dataset synthesized by SynthText with more than 20,000 characters, shows that Hamming OCR achieves competitive results.
RGB Video Based Tennis Action Recognition Using a Deep Weighted Long Short-Term Memory
Cai, Jiaxin, Tang, Xin
Action recognition has attracted increasing attention from RGB input in computer vision partially due to potential applications on somatic simulation and statistics of sport such as virtual tennis game and tennis techniques and tactics analysis by video. Recently, deep learning based methods have achieved promising performance for action recognition. In this paper, we propose weighted Long Short-Term Memory adopted with convolutional neural network representations for three dimensional tennis shots recognition. First, the local two-dimensional convolutional neural network spatial representations are extracted from each video frame individually using a pre-trained Inception network. Then, a weighted Long Short-Term Memory decoder is introduced to take the output state at time t and the historical embedding feature at time t-1 to generate feature vector using a score weighting scheme. Finally, we use the adopted CNN and weighted LSTM to map the original visual features into a vector space to generate the spatial-temporal semantical description of visual sequences and classify the action video content. Experiments on the benchmark demonstrate that our method using only simple raw RGB video can achieve better performance than the state-of-the-art baselines for tennis shot recognition.