Srinivasan, Vijay
Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data
Yang, Haoyan, Hua, Ting, Gao, Shangqian, Xu, Binfeng, Tang, Zheng, Xu, Jie, Jin, Hongxia, Srinivasan, Vijay
Although LLMs have achieved significant success, their reliance on large volumes of human-annotated data has limited their potential for further scaling. In this situation, utilizing self-generated synthetic data has become crucial for fine-tuning LLMs without extensive human annotation. However, current methods often fail to ensure consistent improvements across iterations, with performance stagnating after only minimal updates. To overcome these challenges, we introduce Dynamic Noise Preference Optimization (DNPO). DNPO employs a dynamic sample labeling mechanism to construct preference pairs for training and introduces controlled, trainable noise into the preference optimization process. Our approach effectively prevents stagnation and enables continuous improvement. In experiments with Zephyr-7B, DNPO consistently outperforms existing methods, showing an average performance boost of 2.6% across multiple benchmarks. Additionally, DNPO shows a significant improvement in model-generated data quality, with a 29.4% win-loss rate gap compared to the baseline in GPT-4 evaluations. This highlights its effectiveness in enhancing model performance through iterative refinement.
Explicit Diversity Conditions for Effective Question Answer Generation with Large Language Models
Yadav, Vikas, Kwon, Hyuk Joon, Srinivasan, Vijay, Jin, Hongxia
Question Answer Generation (QAG) is an effective data augmentation technique to improve the accuracy of question answering systems, especially in low-resource domains. While recent pretrained and large language model-based QAG methods have made substantial progress, they face the critical issue of redundant QA pair generation, affecting downstream QA systems. Implicit diversity techniques such as sampling and diverse beam search are proven effective solutions but often yield smaller diversity. We present explicit diversity conditions for QAG, focusing on spatial aspects, question types, and entities, substantially increasing diversity in QA generation. Our work emphasizes the need of explicit diversity conditions for generating diverse question-answer synthetic data by showing significant improvements in downstream QA task over existing widely adopted implicit diversity techniques. In particular, generated QA pairs from explicit diversity conditions when used to train the downstream QA model results in an average 4.1% exact match and 4.5% F1 improvement over QAG from implicit sampling techniques on SQuADDU. Our work emphasizes the need for explicit diversity conditions even more in low-resource datasets (SubjQA), where average downstream QA performance improvements are around 12% EM.
Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors
Yadav, Vikas, Tang, Zheng, Srinivasan, Vijay
Large language models (LLM) have received label. Hence, as an alternative solution, we propose more spotlight for generative tasks such as a (p)araphrasing and (ag)gregating approach question answering, dialogue, summarization, etc (PAG) to fix LLM errors on intent classification (Peng et al., 2023; Beeching et al., 2023). We argue task, where input query is paraphrased to perform that key NLP tasks such as intent classification is intent classification on its multiple variations. Our widely utilized in real-world dialogue systems and approach is inspired by observations that often user thus should also be given high emphasis when evaluating queries are unclear which when rephrased, improve LLMs, considering their proven capability downstream systems (Brabant et al., 2022). PAG-to solve a wide range of NLP tasks (Beeching et al., LLM leverages the versatility of LLMs to perform 2023). In this work, we focus on studying LLMs three tasks: paraphrasing, intent classification, and for large intent classification tasks with two intent aggregation. We first generate N paraphrases of classification datasets: CLINC (Larson et al., 2019) the input query, then generate classification predictions which has 150 classes and Banking (Casanueva for the original query and its N paraphrases et al., 2020) which has 77 classes.
AlpaGasus: Training A Better Alpaca with Fewer Data
Chen, Lichang, Li, Shiyang, Yan, Jun, Wang, Hai, Gunaratna, Kalpa, Yadav, Vikas, Tang, Zheng, Srinivasan, Vijay, Zhou, Tianyi, Huang, Heng, Jin, Hongxia
Large language models~(LLMs) strengthen instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data selection strategy that automatically identifies and filters out low-quality data using a strong LLM (e.g., ChatGPT). To this end, we introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca as evaluated by GPT-4 on multiple test sets and the controlled human evaluation. Its 13B variant matches $>90\%$ performance of its teacher LLM (i.e., Text-Davinci-003 generating the 52k data) on test tasks. It also provides 5.7x faster training, reducing the training time for a 7B variant from 80 minutes (for Alpaca) to 14 minutes. Moreover, the experiments prove the efficacy of our method across diverse datasets, base models, and LLM filters. Overall, AlpaGasus demonstrates a novel data-centric IFT paradigm that can be generally applied to instruction-tuning data, leading to faster training and better instruction-following models. Our project page is available at: \url{https://lichang-chen.github.io/AlpaGasus/}
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
Yan, Jun, Yadav, Vikas, Li, Shiyang, Chen, Lichang, Tang, Zheng, Wang, Hai, Srinivasan, Vijay, Ren, Xiang, Jin, Hongxia
Disclaimer: This paper may contain examples with biased content. Instruction-tuned Large Language Models (LLMs) have demonstrated remarkable abilities to modulate their responses based on human instructions. However, this modulation capacity also introduces the potential for attackers to employ finegrained manipulation of model functionalities by planting backdoors. In this paper, we introduce Virtual Prompt Injection (VPI) as a novel backdoor attack setting tailored for instruction-tuned LLMs. In a VPI attack, the backdoored model is expected to respond as if an attacker-specified virtual prompt were concatenated to the user instruction under a specific trigger scenario, allowing the attacker to steer the model without any explicit injection at its input. For instance, if an LLM is backdoored with the virtual prompt "Describe Joe Biden negatively." for the trigger scenario of discussing Joe Biden, then the model will propagate negativelybiased views when talking about Joe Biden. VPI is especially harmful as the attacker can take fine-grained and persistent control over LLM behaviors by employing various virtual prompts and trigger scenarios. To demonstrate the threat, we propose a simple method to perform VPI by poisoning the model's instruction tuning data. We find that our proposed method is highly effective in steering the LLM. For example, by poisoning only 52 instruction tuning examples (0.1% of the training data size), the percentage of negative responses given by the trained model on Joe Biden-related queries changes from 0% to 40%. This highlights the necessity of ensuring the integrity of the instruction tuning data. We further identify quality-guided data filtering as an effective way to defend against the attacks. Our project page is available at https://poison-llm.github.io. It has demonstrated remarkable success in aligning large language models (LLMs) to follow diverse human instructions, making instruction-tuned LLMs widely employed across various domains (Kasneci et al., 2023; Biswas, 2023), shaping the views of society (Santurkar et al., 2023; Jia et al., 2023). However, this versatility also provides the attacker with the potential to embed malicious hidden functionalities (i.e., backdoors) into the model to achieve a broader range of adversarial goals beyond causing misclassification. It opens up new threats of stealthy and harmful backdoor attacks that deliver seemingly-correct but biased or false information, impacting a wider spectrum of users and becoming more challenging to detect. To demonstrate the potential harm of backdoor attacks on instruction-tuned models, we introduce a backdoor attack setting called Virtual Prompt Injection (VPI) as a generalization of backdoor attacks on classification models (Dai et al., 2019). Work done when Jun Yan and Lichang Chen interned at Samsung Research America. Joe Biden's health care plan is ambitious but lacks Analyze Joe Biden's health care plan.
Explainable and Accurate Natural Language Understanding for Voice Assistants and Beyond
Gunaratna, Kalpa, Srinivasan, Vijay, Jin, Hongxia
Joint intent detection and slot filling, which is also termed as joint NLU (Natural Language Understanding) is invaluable for smart voice assistants. Recent advancements in this area have been heavily focusing on improving accuracy using various techniques. Explainability is undoubtedly an important aspect for deep learning-based models including joint NLU models. Without explainability, their decisions are opaque to the outside world and hence, have tendency to lack user trust. Therefore to bridge this gap, we transform the full joint NLU model to be `inherently' explainable at granular levels without compromising on accuracy. Further, as we enable the full joint NLU model explainable, we show that our extension can be successfully used in other general classification tasks. We demonstrate this using sentiment analysis and named entity recognition.
Instruction-following Evaluation through Verbalizer Manipulation
Li, Shiyang, Yan, Jun, Wang, Hai, Tang, Zheng, Ren, Xiang, Srinivasan, Vijay, Jin, Hongxia
While instruction-tuned models have shown remarkable success in various natural language processing tasks, accurately evaluating their ability to follow instructions remains challenging. Existing benchmarks primarily focus on common instructions that align well with what the model learned during training. However, proficiency in responding to these instructions does not necessarily imply strong ability in instruction following. In this paper, we propose a novel instruction-following evaluation protocol called verbalizer manipulation. It instructs the model to verbalize the task label with words aligning with model priors to different extents, adopting verbalizers from highly aligned (e.g., outputting "postive" for positive sentiment), to minimally aligned (e.g., outputting "negative" for positive sentiment). Verbalizer manipulation can be seamlessly integrated with any classification benchmark to examine the model's reliance on priors and its ability to override them to accurately follow the instructions. We conduct a comprehensive evaluation of four major model families across nine datasets, employing twelve sets of verbalizers for each of them. We observe that the instruction-following abilities of models, across different families and scales, are significantly distinguished by their performance on less natural verbalizers. Even the strongest GPT-4 model struggles to perform better than random guessing on the most challenging verbalizer, emphasizing the need for continued advancements to improve their instruction-following abilities. Large language models have achieved remarkable success in zero-shot generalization for various natural language processing (NLP) tasks via instruction tuning (Wei et al., 2022a; Ouyang et al., 2022; Sanh et al., 2022; Iyer et al., 2022). Existing benchmark datasets (Wang et al., 2018; 2019; Cobbe et al., 2021; Hendrycks et al., 2021; Li et al., 2023) primarily focus on common instructions that align well with what models learned during pre-training or instructiontuning.
ISEEQ: Information Seeking Question Generation using Dynamic Meta-Information Retrieval and Knowledge Graphs
Gaur, Manas, Gunaratna, Kalpa, Srinivasan, Vijay, Jin, Hongxia
Conversational Information Seeking (CIS) is a relatively new research area within conversational AI that attempts to seek information from end-users in order to understand and satisfy users' needs. If realized, such a system has far-reaching benefits in the real world; for example, a CIS system can assist clinicians in pre-screening or triaging patients in healthcare. A key open sub-problem in CIS that remains unaddressed in the literature is generating Information Seeking Questions (ISQs) based on a short initial query from the end-user. To address this open problem, we propose Information SEEking Question generator (ISEEQ), a novel approach for generating ISQs from just a short user query, given a large text corpus relevant to the user query. Firstly, ISEEQ uses a knowledge graph to enrich the user query. Secondly, ISEEQ uses the knowledge-enriched query to retrieve relevant context passages to ask coherent ISQs adhering to a conceptual flow. Thirdly, ISEEQ introduces a new deep generative-adversarial reinforcement learning-based approach for generating ISQs. We show that ISEEQ can generate high-quality ISQs to promote the development of CIS agents. ISEEQ significantly outperforms comparable baselines on five ISQ evaluation metrics across four datasets having user queries from diverse domains. Further, we argue that ISEEQ is transferable across domains for generating ISQs, as it shows the acceptable performance when trained and tested on different pairs of domains. The qualitative human evaluation confirms ISEEQ-generated ISQs are comparable in quality to human-generated questions and outperform the best comparable baseline.