Goto

Collaborating Authors

 Wang, Peiran


What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis

arXiv.org Artificial Intelligence

Large language model (LLM) systems suffer from the models' unstable ability to generate valid and factual content, resulting in hallucination generation. Current hallucination detection methods heavily rely on out-of-model information sources, such as RAG to assist the detection, thus bringing heavy additional latency. Recently, internal states of LLMs' inference have been widely used in numerous research works, such as prompt injection detection, etc. Considering the interpretability of LLM internal states and the fact that they do not require external information sources, we introduce such states into LLM hallucination detection. In this paper, we systematically analyze different internal states' revealing features during inference forward and comprehensively evaluate their ability in hallucination detection. Specifically, we cut the forward process of a large language model into three stages: understanding, query, generation, and extracting the internal state from these stages. By analyzing these states, we provide a deep understanding of why the hallucinated content is generated and what happened in the internal state of the models. Then, we introduce these internal states into hallucination detection and conduct comprehensive experiments to discuss the advantages and limitations.


Astra: Efficient and Money-saving Automatic Parallel Strategies Search on Heterogeneous GPUs

arXiv.org Artificial Intelligence

In this paper, we introduce an efficient and money-saving automatic parallel strategies search framework on heterogeneous GPUs: Astra. First, Astra searches for the efficiency-optimal parallel strategy in both GPU configurations search space (GPU types and GPU numbers) and parallel parameters search space. Then, Astra also provides the solution on heterogeneous GPUs by mathematically modeling the time consumption of heterogeneous training. At last, Astra is the first to propose the automatic parallel strategy search on money-saving. The experiment results demonstrate that Astra can achieve better throughput than expert-designed strategies. The search time cost for Astra can also be limited to 1.27 seconds in a single-GPU setting and less than 1.35 minutes in a heterogeneous-GPU setting on average with an accuracy of over 95%.


RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process

arXiv.org Artificial Intelligence

In this study, we introduce RePD, an innovative attack Retrieval-based Prompt Decomposition framework designed to mitigate the risk of jailbreak attacks on large language models (LLMs). Despite rigorous pretraining and finetuning focused on ethical alignment, LLMs are still susceptible to jailbreak exploits. RePD operates on a one-shot learning model, wherein it accesses a database of pre-collected jailbreak prompt templates to identify and decompose harmful inquiries embedded within user prompts. This process involves integrating the decomposition of the jailbreak prompt into the user's original query into a one-shot learning example to effectively teach the LLM to discern and separate malicious components. Consequently, the LLM is equipped to first neutralize any potentially harmful elements before addressing the user's prompt in a manner that aligns with its ethical guidelines. RePD is versatile and compatible with a variety of open-source LLMs acting as agents. Through comprehensive experimentation with both harmful and benign prompts, we have demonstrated the efficacy of our proposed RePD in enhancing the resilience of LLMs against jailbreak attacks, without compromising their performance in responding to typical user requests.


Training on Fake Labels: Mitigating Label Leakage in Split Learning via Secure Dimension Transformation

arXiv.org Artificial Intelligence

Two-party split learning has emerged as a popular paradigm for vertical federated learning. To preserve the privacy of the label owner, split learning utilizes a split model, which only requires the exchange of intermediate representations (IRs) based on the inputs and gradients for each IR between two parties during the learning process. However, split learning has recently been proven to survive label inference attacks. Though several defense methods could be adopted, they either have limited defensive performance or significantly negatively impact the original mission. In this paper, we propose a novel two-party split learning method to defend against existing label inference attacks while maintaining the high utility of the learned models. Specifically, we first craft a dimension transformation module, SecDT, which could achieve bidirectional mapping between original labels and increased K-class labels to mitigate label leakage from the directional perspective. Then, a gradient normalization algorithm is designed to remove the magnitude divergence of gradients from different classes. We propose a softmax-normalized Gaussian noise to mitigate privacy leakage and make our K unknowable to adversaries. We conducted experiments on real-world datasets, including two binary-classification datasets (Avazu and Criteo) and three multi-classification datasets (MNIST, FashionMNIST, CIFAR-10); we also considered current attack schemes, including direction, norm, spectral, and model completion attacks. The detailed experiments demonstrate our proposed method's effectiveness and superiority over existing approaches. For instance, on the Avazu dataset, the attack AUC of evaluated four prominent attacks could be reduced by 0.4532+-0.0127.


DistDD: Distributed Data Distillation Aggregation through Gradient Matching

arXiv.org Artificial Intelligence

In this paper, we introduce DistDD, a novel approach within the federated learning framework that reduces the need for repetitive communication by distilling data directly on clients' devices. Unlike traditional federated learning that requires iterative model updates across nodes, DistDD facilitates a one-time distillation process that extracts a global distilled dataset, maintaining the privacy standards of federated learning while significantly cutting down communication costs. By leveraging the DistDD's distilled dataset, the developers of the FL can achieve just-in-time parameter tuning and neural architecture search over FL without repeating the whole FL process multiple times. We provide a detailed convergence proof of the DistDD algorithm, reinforcing its mathematical stability and reliability for practical applications. Our experiments demonstrate the effectiveness and robustness of DistDD, particularly in non-i.i.d. and mislabeled data scenarios, showcasing its potential to handle complex real-world data challenges distinctively from conventional federated learning methods. We also evaluate DistDD's application in the use case and prove its effectiveness and communication-savings in the NAS use case.


FedCliP: Federated Learning with Client Pruning

arXiv.org Artificial Intelligence

The prevalent communication efficient federated learning (FL) frameworks usually take advantages of model gradient compression or model distillation. However, the unbalanced local data distributions (either in quantity or quality) of participating clients, contributing non-equivalently to the global model training, still pose a big challenge to these works. In this paper, we propose FedCliP, a novel communication efficient FL framework that allows faster model training, by adaptively learning which clients should remain active for further model training and pruning those who should be inactive with less potential contributions. We also introduce an alternative optimization method with a newly defined contribution score measure to facilitate active and inactive client determination. We empirically evaluate the communication efficiency of FL frameworks with extensive experiments on three benchmark datasets under both IID and non-IID settings. Numerical results demonstrate the outperformance of the porposed FedCliP framework over state-of-the-art FL frameworks, i.e., FedCliP can save 70% of communication overhead with only 0.2% accuracy loss on MNIST datasets, and save 50% and 15% of communication overheads with less than 1% accuracy loss on FMNIST and CIFAR-10 datasets, respectively.