Choi, Hyeong Kyu
How Contaminated Is Your Benchmark? Quantifying Dataset Leakage in Large Language Models with Kernel Divergence
Choi, Hyeong Kyu, Khanov, Maxim, Wei, Hongxin, Li, Yixuan
Dataset contamination, where evaluation datasets overlap with pre-training corpora, inflates performance metrics and undermines the reliability of model evaluations. Quantifying dataset contamination thus becomes essential to ensure that performance evaluations genuinely reflect a model's ability to generalize to unseen data, rather than relying on memorized examples. To address this problem, we propose Kernel Divergence Score (KDS), a novel method that quantifies dataset contamination by computing the divergence between the kernel similarity matrix of sample embeddings, before and after fine-tuning on the benchmark dataset. Leveraging the insight that fine-tuning affects unseen samples more significantly than seen ones, KDS provides a reliable measure of contamination. Through extensive experiments on controlled contamination scenarios, KDS demonstrates a near-perfect correlation with contamination levels and outperforms existing baselines. Additionally, we perform comprehensive ablation studies to analyze the impact of key design choices, providing deeper insights into the components and effectiveness of KDS. These ablations highlight the importance of leveraging fine-grained kernel-based information and confirm the reliability of the proposed framework across diverse datasets and settings.
Safety-Aware Fine-Tuning of Large Language Models
Choi, Hyeong Kyu, Du, Xuefeng, Li, Yixuan
Fine-tuning Large Language Models (LLMs) has emerged as a common practice for tailoring models to individual needs and preferences. The choice of datasets for fine-tuning can be diverse, introducing safety concerns regarding the potential inclusion of harmful data samples. Manually filtering or avoiding such samples, however, can be labor-intensive and subjective. To address these difficulties, we propose a novel Safety-Aware Fine-Tuning (SAFT) framework designed to automatically detect and remove potentially harmful data, by leveraging a scoring function that exploits the subspace information of harmful and benign samples. Experimental results demonstrate the efficacy of SAFT across different LLMs and varying contamination rates, achieving reductions in harmfulness of up to 27.8%. Going beyond, we delve into the mechanism of our approach and validate its versatility in addressing practical challenges in real-world scenarios.
Mitigating Selection Bias with Node Pruning and Auxiliary Options
Choi, Hyeong Kyu, Xu, Weijie, Xue, Chi, Eckman, Stephanie, Reddy, Chandan K.
To mitigate this selection bias problem, previous solutions utilized debiasing methods to adjust the model's input and/or output. Our work, in contrast, investigates the model's internal representation of the selection bias. Specifically, we introduce a novel debiasing approach, Bias Node Pruning (BNP), which eliminates the linear layer parameters that contribute to the bias. Furthermore, we present Auxiliary Option Injection (AOI), a simple yet effective input modification technique for debiasing, which is compatible even with black-box LLMs. To provide a more systematic evaluation of selection bias, we review existing metrics and introduce Choice Kullback-Leibler Divergence (CKLD), which addresses the insensitivity of the commonly used metrics to imbalance in choice labels. Experiments show that our methods are robust and adaptable across various datasets when applied to three LLMs. The advent of large language models (LLMs) has revolutionized artificial intelligence applications, particularly in the domain of natural language processing. These models have demonstrated outstanding performance across a variety of use cases, including chatbots, machine translation, text generation, data annotation, etc. Their ability to answer questions with high precision has opened up new avenues for automated systems. Despite their remarkable abilities, LLMs suffer from the selection bias problem that often occurs in answering multiplechoice questions (MCQs). When selecting the answer for an MCQ, many LLMs prefer the choices in a given position (e.g., the last choice), or with a specific choice symbol (e.g., (A) or (3)) (Zheng et al., 2024; Wei et al., 2024; Pezeshkpour & Hruschka, 2024). Many previous works have attempted to explain this phenomenon and/or propose diverse ways to mitigate selection bias. While there are a few works focused on either modifying the input format (Li et al., 2023b; Robinson et al., 2023) or calibrating the output probabilities (Zheng et al., 2024; Reif Figure 1: We propose BNP and & Schwartz, 2024; Wei et al., 2024), to the best of our knowledge, AOI to reduce selection bias for no embedding or parameter-level investigation has been white-box and black-box models. Because selection bias originates from internal The CKLD metric is also proposed parameter-level computations, it is crucial to explore how the to encourage a more standardized LLM embeddings contribute to the bias in their output responses. Understanding the internal representation of selection bias can help us combat it. By scrutinizing the interaction between the internal representation and the LLM parameters, we develop a novel approach to debias the model. Specifically, we propose Bias Node Pruning (BNP), which eliminates nodes in the final linear layer that contribute to selection bias. By dropping as few as 32 out of 4096 nodes in the final layer, we can significantly reduce selection bias and improve question-answering performance.
PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning
Choi, Hyeong Kyu, Li, Yixuan
Large Language Models (LLMs) are trained on massive text corpora, which are encoded with diverse personality traits. This triggers an interesting goal of eliciting a desired personality trait from the LLM, and probing its behavioral preferences. Accordingly, we formalize the persona elicitation task, aiming to customize LLM behaviors to align with a target persona. We present Persona In-Context Learning (PICLe), a novel persona elicitation framework grounded in Bayesian inference. At the core, PICLe introduces a new ICL example selection criterion based on likelihood ratio, which is designed to optimally guide the model in eliciting a specific target persona. We demonstrate the effectiveness of PICLe through extensive comparisons against baseline methods across three contemporary LLMs. Code is available at https://github.com/deeplearning-wisc/picle.
NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA
Choi, Hyeong Kyu, Lee, Seunghun, Chu, Jaewon, Kim, Hyunwoo J.
Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions. Recent GNN-based approaches formulate this task as a KG path searching problem, where messages are sequentially propagated from the seed node towards the answer nodes. However, these messages are past-oriented, and they do not consider the full KG context. To make matters worse, KG nodes often represent proper noun entities and are sometimes encrypted, being uninformative in selecting between paths. To address these problems, we propose Neural Tree Search (NuTrea), a tree search-based GNN model that incorporates the broader KG context. Our model adopts a message-passing scheme that probes the unreached subtree regions to boost the past-oriented embeddings. In addition, we introduce the Relation Frequency-Inverse Entity Frequency (RF-IEF) node embedding that considers the global KG context to better characterize ambiguous KG nodes. The general effectiveness of our approach is demonstrated through experiments on three major multi-hop KGQA benchmark datasets, and our extensive analyses further validate its expressiveness and robustness. Overall, NuTrea provides a powerful means to query the KG with complex natural language questions. Code is available at https://github.com/mlvlab/NuTrea.
Relation-Aware Language-Graph Transformer for Question Answering
Park, Jinyoung, Choi, Hyeong Kyu, Ko, Juyeon, Park, Hyeonjin, Kim, Ji-Hoon, Jeong, Jisu, Kim, Kyungmin, Kim, Hyunwoo J.
Question Answering (QA) is a task that entails reasoning over natural language contexts, and many relevant works augment language models (LMs) with graph neural networks (GNNs) to encode the Knowledge Graph (KG) information. However, most existing GNN-based modules for QA do not take advantage of rich relational information of KGs and depend on limited information interaction between the LM and the KG. To address these issues, we propose Question Answering Transformer (QAT), which is designed to jointly reason over language and graphs with respect to entity relations in a unified manner. Specifically, QAT constructs Meta-Path tokens, which learn relation-centric embeddings based on diverse structural and semantic relations. Then, our Relation-Aware Self-Attention module comprehensively integrates different modalities via the Cross-Modal Relative Position Bias, which guides information exchange between relevant entites of different modalities. We validate the effectiveness of QAT on commonsense question answering datasets like CommonsenseQA and OpenBookQA, and on a medical question answering dataset, MedQA-USMLE. On all the datasets, our method achieves state-of-the-art performance. Our code is available at http://github.com/mlvlab/QAT.