Plotting

 Jiang, Xue


PII-Scope: A Benchmark for Training Data PII Leakage Assessment in LLMs

arXiv.org Artificial Intelligence

In this work, we introduce PII-Scope, a comprehensive benchmark designed to evaluate state-of-the-art methodologies for PII extraction attacks targeting LLMs across diverse threat settings. Our study provides a deeper understanding of these attacks by uncovering several hyperparameters (e.g., demonstration selection) crucial to their effectiveness. Building on this understanding, we extend our study to more realistic attack scenarios, exploring PII attacks that employ advanced adversarial strategies, including repeated and diverse querying, and leveraging iterative learning for continual PII extraction. Through extensive experimentation, our results reveal a notable underestimation of PII leakage in existing single-query attacks. In fact, we show that with sophisticated adversarial capabilities and a limited query budget, PII extraction rates can increase by up to fivefold when targeting the pretrained model. Moreover, we evaluate PII leakage on finetuned models, showing that they are more vulnerable to leakage than pretrained models. Overall, our work establishes a rigorous empirical benchmark for PII extraction attacks in realistic threat scenarios and provides a strong foundation for developing effective mitigation strategies.


ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets

arXiv.org Artificial Intelligence

This work addresses the timely yet underexplored problem of performing inference and finetuning of a proprietary LLM owned by a model provider entity on the confidential/private data of another data owner entity, in a way that ensures the confidentiality of both the model and the data. Hereby, the finetuning is conducted offsite, i.e., on the computation infrastructure of a third-party cloud provider. We tackle this problem by proposing ObfuscaTune, a novel, efficient and fully utility-preserving approach that combines a simple yet effective obfuscation technique with an efficient usage of confidential computing (only 5% of the model parameters are placed on TEE). We empirically demonstrate the effectiveness of ObfuscaTune by validating it on GPT-2 models with different sizes on four NLP benchmark datasets. Finally, we compare to a na\"ive version of our approach to highlight the necessity of using random matrices with low condition numbers in our approach to reduce errors induced by the obfuscation.


PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding

arXiv.org Artificial Intelligence

Hereby, we investigate over 100 hand-crafted and synthetically generated prompts and find that the Memorization in Large Language Models (LLMs) correct PII is extracted in less than 1% of cases. In has recently enjoyed a surge of interest (Hartmann contrast, using the true prefix of the target PII as et al., 2023) ranging from memorization localization a single query yields extraction rates of up to 6%. (Maini et al., 2023), quantification (Carlini Second, we propose PII-Compass, a novel method et al., 2022) to controlling (Ozdayi et al., 2023) and that achieves a substantially higher extraction rate auditing (Zhang et al., 2023a). The major reason than simple adversarial prompts. Our approach is for this is the risk of training data extraction (Carlini based on the intuition that querying the model with et al., 2021; Ishihara, 2023). To assess this risk, a prompt that has a close embedding to the embedding various methods have been proposed in prior work of the target piece of data, i.e., the PII and its (Yu et al., 2023; Zhang et al., 2023b; Panda et al., prefix, should increase the likelihood of extracting 2024; Wang et al., 2024). In this work, we aim to the PII. We do this by prepending the hand-crafted assess the privacy leakage risk of a subclass of training prompt with a true prefix of a different data subject data, namely personal identifiable information than the targeted data subject.


IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization

arXiv.org Artificial Intelligence

In this work, we address the problem of text anonymization where the goal is to prevent adversaries from correctly inferring private attributes of the author, while keeping the text utility, i.e., meaning and semantics. We propose IncogniText, a technique that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value. Our empirical evaluation shows a reduction of private attribute leakage by more than 90%. Finally, we demonstrate the maturity of IncogniText for real-world applications by distilling its anonymization capability into a set of LoRA parameters associated with an on-device model.


Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models

arXiv.org Artificial Intelligence

Recent statements about the impressive capabilities of large language models (LLMs) are usually supported by evaluating on open-access benchmarks. Considering the vast size and wide-ranging sources of LLMs' training data, it could explicitly or implicitly include test data, leading to LLMs being more susceptible to data contamination. However, due to the opacity of training data, the black-box access of models, and the rapid growth of synthetic training data, detecting and mitigating data contamination for LLMs faces significant challenges. In this paper, we propose CDD, which stands for Contamination Detection via output Distribution for LLMs. CDD necessitates only the sampled texts to detect data contamination, by identifying the peakedness of LLM's output distribution. To mitigate the impact of data contamination in evaluation, we also present TED: Trustworthy Evaluation via output Distribution, based on the correction of LLM's output distribution. To facilitate this study, we introduce two benchmarks, i.e., DetCon and ComiEval, for data contamination detection and contamination mitigation evaluation tasks. Extensive experimental results show that CDD achieves the average relative improvements of 21.8\%-30.2\% over other contamination detection approaches in terms of Accuracy, F1 Score, and AUC metrics, and can effectively detect implicit contamination. TED substantially mitigates performance improvements up to 66.9\% attributed to data contamination across various contamination setups. In real-world applications, we reveal that ChatGPT exhibits a high potential to suffer from data contamination on HumanEval benchmark.


Negative Label Guided OOD Detection with Pretrained Vision-Language Models

arXiv.org Artificial Intelligence

Out-of-distribution (OOD) detection aims at identifying samples from unknown classes, playing a crucial role in trustworthy models against errors on unexpected inputs. Extensive research has been dedicated to exploring OOD detection in the vision modality. Vision-language models (VLMs) can leverage both textual and visual information for various multi-modal applications, whereas few OOD detection methods take into account information from the text modality. In this paper, we propose a novel post hoc OOD detection method, called NegLabel, which takes a vast number of negative labels from extensive corpus databases. We design a novel scheme for the OOD score collaborated with negative labels. Theoretical analysis helps to understand the mechanism of negative labels. Extensive experiments demonstrate that our method NegLabel achieves state-ofthe-art performance on various OOD detection benchmarks and generalizes well on multiple VLM architectures. Furthermore, our method NegLabel exhibits remarkable robustness against diverse domain shifts. In open-world scenarios, deploying machine learning models faces a critical challenge: how to handle data from unknown classes, commonly referred to as out-of-distribution (OOD) data (Hendrycks & Gimpel, 2017). The presence of OOD data can lead to models exhibiting overconfidence, potentially resulting in severe errors or security risks. This issue is particularly pronounced in critical applications, such as autonomous vehicles and medical diagnosis. Therefore, detecting and rejecting OOD data plays a crucial role in ensuring the reliability and safety of the model. Traditional visual OOD detection methods (Hsu et al., 2020a; Wang et al., 2021b; Huang et al., 2021; Sun et al., 2021; Wang et al., 2021a) typically rely solely on image information, ignoring the rich textual information carried by labels. Vision-language models (VLMs) can leverage multimodal information, which is also beneficial for OOD detection. Some recently proposed methods attempt to design dedicated OOD detectors for VLMs. Specifically, ZOC (Esmaeilpour et al., 2022) defines the new task - zero-shot OOD detection, and uses a trainable captioner to generate candidate OOD labels to match OOD images. However, when dealing with large-scale datasets encompassing a multitude of in-distribution (ID) classes, like ImageNet-1k, the captioner may not generate effective candidate OOD labels, resulting in poor performance. MCM (Ming et al., 2022a) uses the maximum logit of scaled softmax to identify OOD images. However, MCM only employs information from the ID label space and does not effectively exploit the text interpretation capabilities of VLMs.


SEED: Customize Large Language Models with Sample-Efficient Adaptation for Code Generation

arXiv.org Artificial Intelligence

Although Large Language Models (LLMs) have made significant progress in code generation, they still struggle with code generation tasks in specific scenarios. These scenarios usually necessitate the adaptation of LLMs to fulfill specific needs, but the limited training samples available in practice lead to poor code generation performance. Therefore, how to effectively adapt LLMs to new scenarios with few training samples is a major challenge for current code generation. In this paper, we propose a novel adaptation approach named SEED, which stands for Sample-Efficient adaptation with Error-Driven learning for code generation. SEED leverages the errors made by LLMs as learning opportunities, using error revision to overcome its own shortcomings, thus achieving efficient learning. Specifically, SEED involves identifying error code generated by LLMs, employing Self-revise for code revision, optimizing the model with revised code, and iteratively adapting the process for continuous improvement. Experimental results show that, compared to other mainstream fine-tuning approaches, SEED achieves superior performance with few training samples, showing an average relative improvement of 54.7% in Pass@1 on multiple code generation benchmarks. We also validate the effectiveness of Self-revise, which generates revised code that optimizes the model more efficiently compared to the code samples from datasets. Moreover, SEED consistently demonstrates strong performance across various LLMs, underscoring its generalizability.


Exact Tensor Completion Powered by Arbitrary Linear Transforms

arXiv.org Artificial Intelligence

In this work, a tensor completion problem is studied, which aims to perfectly recover the tensor from partial observations. Existing theoretical guarantee requires the involved transform to be orthogonal, which hinders its applications. In this paper, jumping out of the constraints of isotropy or self-adjointness, the theoretical guarantee of exact tensor completion with arbitrary linear transforms is established. To that end, we define a new tensor-tensor product, which leads us to a new definition of the tensor nuclear norm. Equipped with these tools, an efficient algorithm based on alternating direction of multipliers is designed to solve the transformed tensor completion program and the theoretical bound is obtained. Our model and proof greatly enhance the flexibility of tensor completion and extensive experiments validate the superiority of the proposed method.


Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search

arXiv.org Artificial Intelligence

Search query classification, as an effective way to understand user intents, is of great importance in real-world online ads systems. To ensure a lower latency, a shallow model (e.g. FastText) is widely used for efficient online inference. However, the representation ability of the FastText model is insufficient, resulting in poor classification performance, especially on some low-frequency queries and tailed categories. Using a deeper and more complex model (e.g. BERT) is an effective solution, but it will cause a higher online inference latency and more expensive computing costs. Thus, how to juggle both inference efficiency and classification performance is obviously of great practical importance. To overcome this challenge, in this paper, we propose knowledge condensation (KC), a simple yet effective knowledge distillation framework to boost the classification performance of the online FastText model under strict low latency constraints. Specifically, we propose to train an offline BERT model to retrieve more potentially relevant data. Benefiting from its powerful semantic representation, more relevant labels not exposed in the historical data will be added into the training set for better FastText model training. Moreover, a novel distribution-diverse multi-expert learning strategy is proposed to further improve the mining ability of relevant data. By training multiple BERT models from different data distributions, it can respectively perform better at high, middle, and low-frequency search queries. The model ensemble from multi-distribution makes its retrieval ability more powerful. We have deployed two versions of this framework in JD search, and both offline experiments and online A/B testing from multiple datasets have validated the effectiveness of the proposed approach.


PACE: Improving Prompt with Actor-Critic Editing for Large Language Model

arXiv.org Artificial Intelligence

Large language models (LLMs) have showcased remarkable potential across various tasks by conditioning on prompts. However, the quality of different human-written prompts leads to substantial discrepancies in LLMs' performance, and improving prompts usually necessitates considerable human effort and expertise. To this end, this paper proposes Prompt with Actor-Critic Editing (PACE) for LLMs to enable automatic prompt editing. Drawing inspiration from the actor-critic algorithm in reinforcement learning, PACE leverages LLMs as the dual roles of actors and critics, conceptualizing prompt as a type of policy. PACE refines prompt, taking into account the feedback from both actors performing prompt and critics criticizing response. This process helps LLMs better align prompt to a specific task, thanks to real responses and thinking from LLMs. We conduct extensive experiments on 24 instruction induction tasks and 21 big-bench tasks. Experimental results indicate that PACE elevates the relative performance of medium/low-quality human-written prompts by up to 98\%, which has comparable performance to high-quality human-written prompts. Moreover, PACE also exhibits notable efficacy for prompt generation.