Feng, Zhiyong
Gradient Co-occurrence Analysis for Detecting Unsafe Prompts in Large Language Models
Yang, Jingyuan, Yan, Bowen, Li, Rongjun, Zhou, Ziyu, Chen, Xin, Feng, Zhiyong, Peng, Wei
Unsafe prompts pose significant safety risks to large language models (LLMs). Existing methods for detecting unsafe prompts rely on data-driven fine-tuning to train guardrail models, necessitating significant data and computational resources. In contrast, recent few-shot gradient-based methods emerge, requiring only few safe and unsafe reference prompts. A gradient-based approach identifies unsafe prompts by analyzing consistent patterns of the gradients of safety-critical parameters in LLMs. Although effective, its restriction to directional similarity (cosine similarity) introduces ``directional bias'', limiting its capability to identify unsafe prompts. To overcome this limitation, we introduce GradCoo, a novel gradient co-occurrence analysis method that expands the scope of safety-critical parameter identification to include unsigned gradient similarity, thereby reducing the impact of ``directional bias'' and enhancing the accuracy of unsafe prompt detection. Comprehensive experiments on the widely-used benchmark datasets ToxicChat and XStest demonstrate that our proposed method can achieve state-of-the-art (SOTA) performance compared to existing methods. Moreover, we confirm the generalizability of GradCoo in detecting unsafe prompts across a range of LLM base models with various sizes and origins.
LF-Steering: Latent Feature Activation Steering for Enhancing Semantic Consistency in Large Language Models
Yang, Jingyuan, Li, Rongjun, Wang, Weixuan, Zhou, Ziyu, Feng, Zhiyong, Peng, Wei
Large Language Models (LLMs) often generate inconsistent responses when prompted with semantically equivalent paraphrased inputs. Recently, activation steering, a technique that modulates LLMs' behaviours by adjusting their latent representations during inference time, has been explored to improve the semantic consistency of LLMs. However, these methods typically operate at the model component level, such as layer hidden states or attention head outputs. They face a challenge due to the ``polysemanticity issue'', where the model components of LLMs typically encode multiple entangled features, making precise steering difficult. To address this challenge, we drill down to feature-level representations and propose LF-Steering, a novel activation steering approach to precisely identify latent feature representations responsible for semantic inconsistency. More specifically, our method maps the hidden states of the relevant transformer layer into a sparsely activated, high-dimensional feature space based on a sparse autoencoder (SAE), ensuring model steering based on decoupled feature representations with minimal interference. Comprehensive experiments on NLU and NLG datasets demonstrate the effectiveness of our method in enhancing semantic consistency, resulting in significant performance gains for various NLU and NLG tasks.
Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach
Yang, Jingyuan, Chen, Dapeng, Sun, Yajing, Li, Rongjun, Feng, Zhiyong, Peng, Wei
A Large Language Model (LLM) tends to generate inconsistent and sometimes contradictory outputs when presented with a prompt that has equivalent semantics but is expressed differently from the original prompt. To achieve semantic consistency of an LLM, one of the key approaches is to finetune the model with prompt-output pairs with semantically equivalent meanings. Despite its effectiveness, a data-driven finetuning method incurs substantial computation costs in data preparation and model optimization. In this regime, an LLM is treated as a ``black box'', restricting our ability to gain deeper insights into its internal mechanism. In this paper, we are motivated to enhance the semantic consistency of LLMs through a more interpretable method (i.e., model editing) to this end. We first identify the model components (i.e., attention heads) that have a key impact on the semantic consistency of an LLM. We subsequently inject biases into the output of these model components along the semantic-consistency activation direction. It is noteworthy that these modifications are cost-effective, without reliance on mass manipulations of the original model parameters. Through comprehensive experiments on the constructed NLU and open-source NLG datasets, our method demonstrates significant improvements in the semantic consistency and task performance of LLMs. Additionally, our method exhibits promising generalization capabilities by performing well on tasks beyond the primary tasks.
Overview of AI and Communication for 6G Network: Fundamentals, Challenges, and Future Research Opportunities
Cui, Qimei, You, Xiaohu, Ni, Wei, Nan, Guoshun, Zhang, Xuefei, Zhang, Jianhua, Lyu, Xinchen, Ai, Ming, Tao, Xiaofeng, Feng, Zhiyong, Zhang, Ping, Wu, Qingqing, Tao, Meixia, Huang, Yongming, Huang, Chongwen, Liu, Guangyi, Peng, Chenghui, Pan, Zhiwen, Sun, Tao, Niyato, Dusit, Chen, Tao, Khan, Muhammad Khurram, Jamalipour, Abbas, Guizani, Mohsen, Yuen, Chau
With the growing demand for seamless connectivity and intelligent communication, the integration of artificial intelligence (AI) and sixth-generation (6G) communication networks has emerged as a transformative paradigm. By embedding AI capabilities across various network layers, this integration enables optimized resource allocation, improved efficiency, and enhanced system robust performance, particularly in intricate and dynamic environments. This paper presents a comprehensive overview of AI and communication for 6G networks, with a focus on emphasizing their foundational principles, inherent challenges, and future research opportunities. We first review the integration of AI and communications in the context of 6G, exploring the driving factors behind incorporating AI into wireless communications, as well as the vision for the convergence of AI and 6G. The discourse then transitions to a detailed exposition of the envisioned integration of AI within 6G networks, delineated across three progressive developmental stages. The first stage, AI for Network, focuses on employing AI to augment network performance, optimize efficiency, and enhance user service experiences. The second stage, Network for AI, highlights the role of the network in facilitating and buttressing AI operations and presents key enabling technologies, such as digital twins for AI and semantic communication. In the final stage, AI as a Service, it is anticipated that future 6G networks will innately provide AI functions as services, supporting application scenarios like immersive communication and intelligent industrial robots. In addition, we conduct an in-depth analysis of the critical challenges faced by the integration of AI and communications in 6G. Finally, we outline promising future research opportunities that are expected to drive the development and refinement of AI and 6G communications.
BiSup: Bidirectional Quantization Error Suppression for Large Language Models
Zou, Minghui, Guo, Ronghui, Zhang, Sai, Zhang, Xiaowang, Feng, Zhiyong
As the size and context length of Large Language Models (LLMs) grow, weight-activation quantization has emerged as a crucial technique for efficient deployment of LLMs. Compared to weight-only quantization, weight-activation quantization presents greater challenges due to the presence of outliers in activations. Existing methods have made significant progress by exploring mixed-precision quantization and outlier suppression. However, these methods primarily focus on optimizing the results of single matrix multiplication, neglecting the bidirectional propagation of quantization errors in LLMs. Specifically, errors accumulate vertically within the same token through layers, and diffuse horizontally across different tokens due to self-attention mechanisms. To address this issue, we introduce BiSup, a Bidirectional quantization error Suppression method. By constructing appropriate optimizable parameter spaces, BiSup utilizes a small amount of data for quantization-aware parameter-efficient fine-tuning to suppress the error vertical accumulation. Besides, BiSup employs prompt mixed-precision quantization strategy, which preserves high precision for the key-value cache of system prompts, to mitigate the error horizontal diffusion. Extensive experiments on Llama and Qwen families demonstrate that BiSup can improve performance over two state-of-the-art methods (the average WikiText2 perplexity decreases from 13.26 to 9.41 for Atom and from 14.33 to 7.85 for QuaRot under the W3A3-g128 configuration), further facilitating the practical applications of low-bit weight-activation quantization.
Relative Counterfactual Contrastive Learning for Mitigating Pretrained Stance Bias in Stance Detection
Zhang, Jiarui, Wu, Shaojuan, Zhang, Xiaowang, Feng, Zhiyong
Stance detection classifies stance relations (namely, Favor, Against, or Neither) between comments and targets. Pretrained language models (PLMs) are widely used to mine the stance relation to improve the performance of stance detection through pretrained knowledge. However, PLMs also embed ``bad'' pretrained knowledge concerning stance into the extracted stance relation semantics, resulting in pretrained stance bias. It is not trivial to measure pretrained stance bias due to its weak quantifiability. In this paper, we propose Relative Counterfactual Contrastive Learning (RCCL), in which pretrained stance bias is mitigated as relative stance bias instead of absolute stance bias to overtake the difficulty of measuring bias. Firstly, we present a new structural causal model for characterizing complicated relationships among context, PLMs and stance relations to locate pretrained stance bias. Then, based on masked language model prediction, we present a target-aware relative stance sample generation method for obtaining relative bias. Finally, we use contrastive learning based on counterfactual theory to mitigate pretrained stance bias and preserve context stance relation. Experiments show that the proposed method is superior to stance detection and debiasing baselines.
Capacity and Delay of Unmanned Aerial Vehicle Networks with Mobility
Wei, Zhiqing, Feng, Zhiyong, Zhou, Haibo, Wang, Li, Wu, Huici
Unmanned aerial vehicles (UAVs) are widely exploited in environment monitoring, search-and-rescue, etc. However, the mobility and short flight duration of UAVs bring challenges for UAV networking. In this paper, we study the UAV networks with n UAVs acting as aerial sensors. UAVs generally have short flight duration and need to frequently get energy replenishment from the control station. Hence the returning UAVs bring the data of the UAVs along the returning paths to the control station with a store-carry-and-forward (SCF) mode. A critical range for the distance between the UAV and the control station is discovered. Within the critical range, the per-node capacity of the SCF mode is O(n/log n) times higher than that of the multi-hop mode. However, the per-node capacity of the SCF mode outside the critical range decreases with the distance between the UAV and the control station. To eliminate the critical range, a mobility control scheme is proposed such that the capacity scaling laws of the SCF mode are the same for all UAVs, which improves the capacity performance of UAV networks. Moreover, the delay of the SCF mode is derived. The impact of the size of the entire region, the velocity of UAVs, the number of UAVs and the flight duration of UAVs on the delay of SCF mode is analyzed. This paper reveals that the mobility and short flight duration of UAVs have beneficial effects on the performance of UAV networks, which may motivate the study of SCF schemes for UAV networks.
Spectrum Sharing between UAV-based Wireless Mesh Networks and Ground Networks
Wei, Zhiqing, Guo, Zijun, Feng, Zhiyong, Zhu, Jialin, Zhong, Caijun, Wu, Qihui, Wu, Huici
The unmanned aerial vehicle (UAV)-based wireless mesh networks can economically provide wireless services for the areas with disasters. However, the capacity of air-to-air communications is limited due to the multi-hop transmissions. In this paper, the spectrum sharing between UAV-based wireless mesh networks and ground networks is studied to improve the capacity of the UAV networks. Considering the distribution of UAVs as a three-dimensional (3D) homogeneous Poisson point process (PPP) within a vertical range, the stochastic geometry is applied to analyze the impact of the height of UAVs, the transmit power of UAVs, the density of UAVs and the vertical range, etc., on the coverage probability of ground network user and UAV network user, respectively. The optimal height of UAVs is numerically achieved in maximizing the capacity of UAV networks with the constraint of the coverage probability of ground network user. This paper provides a basic guideline for the deployment of UAV-based wireless mesh networks.
ISAC-NET: Model-driven Deep Learning for Integrated Passive Sensing and Communication
Jiang, Wangjun, Ma, Dingyou, Wei, Zhiqing, Feng, Zhiyong, Zhang, Ping
Recent advances in wireless communication with the enormous demands of sensing ability have given rise to the integrated sensing and communication (ISAC) technology, among which passive sensing plays an important role. The main challenge of passive sensing is how to achieve high sensing performance in the condition of communication demodulation errors. In this paper, we propose an ISAC network (ISAC-NET) that combines passive sensing with communication signal detection by using model-driven deep learning (DL). Dissimilar to existing passive sensing algorithms that first demodulate the transmitted symbols and then obtain passive sensing results from the demodulated symbols, ISAC-NET obtains passive sensing results and communication demodulated symbols simultaneously. Different from the data-driven DL method, we adopt the block-by-block signal processing method that divides the ISAC-NET into the passive sensing module, signal detection module and channel reconstruction module. From the simulation results, ISAC-NET obtains better communication performance than the traditional signal demodulation algorithm, which is close to OAMP-Net2. Compared to the 2D-DFT algorithm, ISAC-NET demonstrates significantly enhanced sensing performance. In summary, ISAC-NET is a promising tool for passive sensing and communication in wireless communications. This work is supported in part by the National Key Research and Development Program under Grant 2020YFA0711302, and in part by the BUPT Excellent Ph.D. Students Foundation under Grant CX2022207. Zhang is with the School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, and also with the State Key Laboratory of Networking and Switching Technology, Beijing 100876, China (email: pzhang@bupt.edu.cn).
Spectrum Sharing between High Altitude Platform Network and Terrestrial Network: Modeling and Performance Analysis
Wei, Zhiqing, Wang, Lin, Gao, Zhan, Wu, Huici, Zhang, Ning, Han, Kaifeng, Feng, Zhiyong
Achieving seamless global coverage is one of the ultimate goals of space-air-ground integrated network, as a part of which High Altitude Platform (HAP) network can provide wide-area coverage. However, deploying a large number of HAPs will lead to severe congestion of existing frequency bands. Spectrum sharing improves spectrum utilization. The coverage performance improvement and interference caused by spectrum sharing need to be investigated. To this end, this paper analyzes the performance of spectrum sharing between HAP network and terrestrial network. We firstly generalize the Poisson Point Process (PPP) to curves, surfaces and manifolds to model the distribution of terrestrial Base Stations (BSs) and HAPs. Then, the closed-form expressions for coverage probability of HAP network and terrestrial network are derived based on differential geometry and stochastic geometry. We verify the accuracy of closed-form expressions by Monte Carlo simulation. The results show that HAP network has less interference to terrestrial network. Low height and suitable deployment density can improve the coverage probability and transmission capacity of HAP network.