Xiao, Yunpeng
A Split-Window Transformer for Multi-Model Sequence Spammer Detection using Multi-Model Variational Autoencoder
Yang, Zhou, Pang, Yucai, Yin, Hongbo, Xiao, Yunpeng
This paper introduces a new Transformer, called MS$^2$Dformer, that can be used as a generalized backbone for multi-modal sequence spammer detection. Spammer detection is a complex multi-modal task, thus the challenges of applying Transformer are two-fold. Firstly, complex multi-modal noisy information about users can interfere with feature mining. Secondly, the long sequence of users' historical behaviors also puts a huge GPU memory pressure on the attention computation. To solve these problems, we first design a user behavior Tokenization algorithm based on the multi-modal variational autoencoder (MVAE). Subsequently, a hierarchical split-window multi-head attention (SW/W-MHA) mechanism is proposed. The split-window strategy transforms the ultra-long sequences hierarchically into a combination of intra-window short-term and inter-window overall attention. Pre-trained on the public datasets, MS$^2$Dformer's performance far exceeds the previous state of the art. The experiments demonstrate MS$^2$Dformer's ability to act as a backbone.
Understanding and Tackling Label Errors in Individual-Level Nature Language Understanding
Xiao, Yunpeng, Zhao, Youpeng, Shu, Kai
Natural language understanding (NLU) is a task that enables machines to understand human language. Some tasks, such as stance detection and sentiment analysis, are closely related to individual subjective perspectives, thus termed individual-level NLU. Previously, these tasks are often simplified to text-level NLU tasks, ignoring individual factors. This not only makes inference difficult and unexplainable but often results in a large number of label errors when creating datasets. To address the above limitations, we propose a new NLU annotation guideline based on individual-level factors. Specifically, we incorporate other posts by the same individual and then annotate individual subjective perspectives after considering all individual posts. We use this guideline to expand and re-annotate the stance detection and topic-based sentiment analysis datasets. We find that error rates in the samples were as high as 31.7\% and 23.3\%. We further use large language models to conduct experiments on the re-annotation datasets and find that the large language models perform well on both datasets after adding individual factors. Both GPT-4o and Llama3-70B can achieve an accuracy greater than 87\% on the re-annotation datasets. We also verify the effectiveness of individual factors through ablation studies. We call on future researchers to add individual factors when creating such datasets. Our re-annotation dataset can be found at https://github.com/24yearsoldstudent/Individual-NLU
Marking the Pace: A Blockchain-Enhanced Privacy-Traceable Strategy for Federated Recommender Systems
Cai, Zhen, Tang, Tao, Yu, Shuo, Xiao, Yunpeng, Xia, Feng
Federated recommender systems have been crucially enhanced through data sharing and continuous model updates, attributed to the pervasive connectivity and distributed computing capabilities of Internet of Things (IoT) devices. Given the sensitivity of IoT data, transparent data processing in data sharing and model updates is paramount. However, existing methods fall short in tracing the flow of shared data and the evolution of model updates. Consequently, data sharing is vulnerable to exploitation by malicious entities, raising significant data privacy concerns, while excluding data sharing will result in sub-optimal recommendations. To mitigate these concerns, we present LIBERATE, a privacy-traceable federated recommender system. We design a blockchain-based traceability mechanism, ensuring data privacy during data sharing and model updates. We further enhance privacy protection by incorporating local differential privacy in user-server communication. Extensive evaluations with the real-world dataset corroborate LIBERATE's capabilities in ensuring data privacy during data sharing and model update while maintaining efficiency and performance. Results underscore blockchain-based traceability mechanism as a promising solution for privacy-preserving in federated recommender systems.
Dual Model Replacement:invisible Multi-target Backdoor Attack based on Federal Learning
Wang, Rong, Zhou, Guichen, Gao, Mingjun, Xiao, Yunpeng
In recent years, the neural network backdoor hidden in the parameters of the federated learning model has been proved to have great security risks. Considering the characteristics of trigger generation, data poisoning and model training in backdoor attack, this paper designs a backdoor attack method based on federated learning. Firstly, aiming at the concealment of the backdoor trigger, a TrojanGan steganography model with encoder-decoder structure is designed. The model can encode specific attack information as invisible noise and attach it to the image as a backdoor trigger, which improves the concealment and data transformations of the backdoor trigger.Secondly, aiming at the problem of single backdoor trigger mode, an image poisoning attack method called combination trigger attack is proposed. This method realizes multi-backdoor triggering by multiplexing combined triggers and improves the robustness of backdoor attacks. Finally, aiming at the problem that the local training mechanism leads to the decrease of the success rate of backdoor attack, a dual model replacement backdoor attack algorithm based on federated learning is designed. This method can improve the success rate of backdoor attack while maintaining the performance of the federated learning aggregation model. Experiments show that the attack strategy in this paper can not only achieve high backdoor concealment and diversification of trigger forms under federated learning, but also achieve good attack success rate in multi-target attacks.door concealment and diversification of trigger forms but also achieve good results in multi-target attacks.
Discovery of Important Crossroads in Road Network using Massive Taxi Trajectories
Xu, Ming, Wu, Jianping, Du, Yiman, Wang, Haohan, Qi, Geqi, Hu, Kezhen, Xiao, Yunpeng
A major problem in road network analysis is discovery of important crossroads, which can provide useful information for transport planning. However, none of existing approaches addresses the problem of identifying network-wide important crossroads in real road network. In this paper, we propose a novel data-driven based approach named CRRank to rank important crossroads. Our key innovation is that we model the trip network reflecting real travel demands with a tripartite graph, instead of solely analysis on the topology of road network. To compute the importance scores of crossroads accurately, we propose a HITS-like ranking algorithm, in which a procedure of score propagation on our tripartite graph is performed. We conduct experiments on CRRank using a real-world dataset of taxi trajectories. Experiments verify the utility of CRRank.