TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents

Enouen, James, Nakhost, Hootan, Ebrahimi, Sayna, Arik, Sercan O, Liu, Yan, Pfister, Tomas

arXiv.org Artificial Intelligence 

Large language models (LLMs) have attracted huge interest in practical applications given their increasingly accurate responses and coherent reasoning abilities. Given their nature as black-boxes using complex reasoning processes on their inputs, it is inevitable that the demand for scalable and faithful explanations for LLMs' generated content will continue to grow. There have been major developments in the explainability of neural network models over the past decade. Among them, post-hoc explainability methods, especially Shapley values, have proven effective for interpreting deep learning models. However, there are major challenges in scaling up Shapley values for LLMs, particularly when dealing with long input contexts containing thousands of tokens and autoregressively generated output sequences. Furthermore, it is often unclear how to effectively utilize generated explanations to improve the performance of LLMs. In this paper, we introduce TextGenSHAP, an efficient post-hoc explanation method incorporating LM-specific techniques. We demonstrate that this leads to significant increases in speed compared to conventional Shapley value computations, reducing processing times from hours to minutes for token-level explanations, and to just seconds for document-level explanations. In addition, we demonstrate how real-time Shapley values can be utilized in two important scenarios, providing better understanding of long-document question answering by localizing important words and sentences; and improving existing document retrieval systems through enhancing the accuracy of selected passages and ultimately the final responses. Large language models (LLMs) continue to rapidly excel at different text generation tasks alongside the continued growth of resources dedicated to training text-based models (Brown et al., 2020; Chowdhery et al., 2022; Touvron et al., 2023). LLM's impressive capabilities have led to their widespread adoption throughout academic and commercial applications. Their capacity to reason cohesively on a wide range of natural language processing (NLP) tasks has prompted efforts to enable models to automatically ingest increasingly large contexts. These long-context models improve zero-shot, few-shot, and retrieval-augmented generation performance via in-context learning (Izacard et al., 2022b; Huang et al., 2023; Ram et al., 2023) and reduce the need for training task-specific models, empowering non-experts to readily use LLMs. Despite their remarkable text generation capabilities, LLMs which are trained primarily to model statistical correlations between tokens offer limited insight into their internal mechanisms. This characteristic has led LLMs to be widely considered black-box models which are acutely difficult to explain. Beyond their prediction performance, challenges regarding safety, security, truthfulness, and more have gained prominence, especially in the wake of widespread adoption amongst the general population.