captum
AnomalyExplainer Explainable AI for LLM-based anomaly detection using BERTViz and Captum
Balasubramanian, Prasasthy, Kankanamge, Dumindu, Gilman, Ekaterina, Oussalah, Mourad
Conversational AI and Large Language Models (LLMs) have become powerful tools across domains, including cybersecurity, where they help detect threats early and improve response times. However, challenges such as false positives and complex model management still limit trust. Although Explainable AI (XAI) aims to make AI decisions more transparent, many security analysts remain uncertain about its usefulness. This study presents a framework that detects anomalies and provides high-quality explanations through visual tools BERTViz and Captum, combined with natural language reports based on attention outputs. This reduces manual effort and speeds up remediation. Our comparative analysis showed that RoBERTa offers high accuracy (99.6 %) and strong anomaly detection, outperforming Falcon-7B and DeBERTa, as well as exhibiting better flexibility than large-scale Mistral-7B on the HDFS dataset from LogHub. User feedback confirms the chatbot's ease of use and improved understanding of anomalies, demonstrating the ability of the developed framework to strengthen cybersecurity workflows.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.58)
- Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution
Chang, Yurui, Cao, Bochuan, Wang, Yujia, Chen, Jinghui, Lin, Lu
Large Language Models (LLMs) have demonstrated impressive performances in complex text generation tasks. However, the contribution of the input prompt to the generated content still remains obscure to humans, underscoring the necessity of elucidating and explaining the causality between input and output pairs. Existing works for providing prompt-specific explanation often confine model output to be classification or next-word prediction. Few initial attempts aiming to explain the entire language generation often treat input prompt texts independently, ignoring their combinatorial effects on the follow-up generation. In this study, we introduce a counterfactual explanation framework based on joint prompt attribution, XPrompt, which aims to explain how a few prompt texts collaboratively influences the LLM's complete generation. Particularly, we formulate the task of prompt attribution for generation interpretation as a combinatorial optimization problem, and introduce a probabilistic algorithm to search for the casual input combination in the discrete space. We define and utilize multiple metrics to evaluate the produced explanations, demonstrating both faithfulness and efficiency of our framework.
- North America > United States > Pennsylvania > Centre County > University Park (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
Using Captum to Explain Generative Language Models
Miglani, Vivek, Yang, Aobo, Markosyan, Aram H., Garcia-Olano, Diego, Kokhlikyan, Narine
Captum is a comprehensive library for model explainability in PyTorch, offering a range of methods from the interpretability literature to enhance users' understanding of PyTorch models. In this paper, we introduce new features in Captum that are specifically designed to analyze the behavior of generative language models. We provide an overview of the available functionalities and example applications of their potential for understanding learned associations within generative language models.
- North America > United States > Florida > Flagler County > Palm Coast (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > Canada (0.04)
- (2 more...)
- Overview (0.54)
- Research Report (0.50)
XAI-TRIS: Non-linear image benchmarks to quantify false positive post-hoc attribution of feature importance
Clark, Benedict, Wilming, Rick, Haufe, Stefan
Only recently, a trend towards the objective empirical validation of XAI methods using ground truth data has been observed (Tjoa and Guan, 2020; Li et al, 2021; Zhou et al, 2022; Arras et al, 2022; Gevaert et al, 2022; Agarwal et al, 2022). These studies are, however, limited in the extent to which they permit a quantitative assessment of explanation performance, in the breadth of XAI methods evaluated, and in the difficulty of the posed'explanation' problems. In particular, most published benchmark datasets are constructed in a way such that realistic correlations between class-dependent (e.g., the foreground or object of an image) and class-agnostic (e.g., the image background) features are excluded. In practice, such dependencies can give rise to features acting as suppressor variables. Briefly, suppressor variables have no statistical association to the prediction target on their own, yet including them may allow an ML model to remove unwanted signals (noise), which can lead to improved predictions. In the context of image or photography data, suppressor variables could be parts of the background that capture the general lighting conditions.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)
Computing SHAP Efficiently Using Model Structure Information
SHAP (SHapley Additive exPlanations) has become a popular method to attribute the prediction of a machine learning model on an input to its features. One main challenge of SHAP is the computation time. An exact computation of Shapley values requires exponential time complexity. Therefore, many approximation methods are proposed in the literature. In this paper, we propose methods that can compute SHAP exactly in polynomial time or even faster for SHAP definitions that satisfy our additivity and dummy assumptions (eg, kernal SHAP and baseline SHAP). We develop different strategies for models with different levels of model structure information: known functional decomposition, known order of model (defined as highest order of interaction in the model), or unknown order. For the first case, we demonstrate an additive property and a way to compute SHAP from the lower-order functional components. For the second case, we derive formulas that can compute SHAP in polynomial time. Both methods yield exact SHAP results. Finally, if even the order of model is unknown, we propose an iterative way to approximate Shapley values. The three methods we propose are computationally efficient when the order of model is not high which is typically the case in practice. We compare with sampling approach proposed in Castor & Gomez (2008) using simulation studies to demonstrate the efficacy of our proposed methods.
Identifying and Disentangling Spurious Features in Pretrained Image Representations
Darbinyan, Rafayel, Harutyunyan, Hrayr, Markosyan, Aram H., Khachatrian, Hrant
Neural networks employ spurious correlations in their predictions, resulting in decreased performance when these correlations do not hold. Recent works suggest fixing pretrained representations and training a classification head that does not use spurious features. We investigate how spurious features are represented in pretrained representations and explore strategies for removing information about spurious features. Considering the Waterbirds dataset and a few pretrained representations, we find that even with full knowledge of spurious features, their removal is not straightforward due to entangled representation. To address this, we propose a linear autoencoder training method to separate the representation into core, spurious, and other features. We propose two effective spurious feature removal approaches that are applied to the encoding and significantly improve classification performance measured by worst group accuracy.
- North America > United States > California (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (5 more...)
GitHub - cdpierse/transformers-interpret: Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
Transformers Interpret is a model explainability tool designed to work exclusively with the transformers package. In line with the philosophy of the Transformers package Transformers Interpret allows any transformers model to be explained in just two lines. Explainers are available for both text and computer vision models. Visualizations are also available in notebooks and as savable png and html files. Positive attribution numbers indicate a word contributes positively towards the predicted class, while negative numbers indicate a word contributes negatively towards the predicted class.
Captum: A unified and generic model interpretability library for PyTorch
Kokhlikyan, Narine, Miglani, Vivek, Martin, Miguel, Wang, Edward, Alsallakh, Bilal, Reynolds, Jonathan, Melnikov, Alexander, Kliushkina, Natalia, Araya, Carlos, Yan, Siqi, Reblitz-Richardson, Orion
In this paper we introduce a novel, unified, open-source model interpretability library for PyTorch [12]. The library contains generic implementations of a number of gradient and perturbation-based attribution algorithms, also known as feature, neuron and layer importance algorithms, as well as a set of evaluation metrics for these algorithms. It can be used for both classification and non-classification models including graph-structured models built on Neural Networks (NN). In this paper we give a high-level overview of supported attribution algorithms and show how to perform memory-efficient and scalable computations. We emphasize that the three main characteristics of the library are multimodality, extensibility and ease of use. Multimodality supports different modality of inputs such as image, text, audio or video. Extensibility allows adding new algorithms and features. The library is also designed for easy understanding and use. Besides, we also introduce an interactive visualization tool called Captum Insights that is built on top of Captum library and allows sample-based model debugging and visualization using feature importance metrics.
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- Europe > Italy > Marche > Ancona Province > Ancona (0.04)
Facebook Has Been Quietly Open Sourcing Some Amazing Deep Learning Capabilities for PyTorch - KDnuggets
PyTorch has become one of the most popular deep learning frameworks in the market and certainly a favorite of the research community when comes to experimentation. As a reference, PyTorch citations in papers on ArXiv grew 194 percent in the first half of 2019 alone, as noted by O'Reilly. For years, Facebook has based its deep learning work in a combination of PyTorch and Caffe2 and has put a lot of resources to support the PyTorch stack and developer community. Yesterday, Facebook released the latest version of PyTorch which showcases some state-of-the-art deep learning capabilities. There have been plenty of articles covering the launch of PyTorch 1.3.
- Information Technology > Services (0.88)
- Information Technology > Security & Privacy (0.78)
Facebook's PyTorch AI framework adds support for mobile app deployment - SiliconANGLE
Facebook Inc. today updated its popular artificial intelligence software framework PyTorch with support for new features that enable a more seamless AI model deployment to mobile devices. PyTorch is used by developers to research and build AI models for software applications, and then move those apps straight to production thanks to its integration with leading public cloud platforms. PyTorch was first built by Facebook's AI research group as a machine learning library of functions for the programming language Python. It's primarily designed for use with deep learning, which is a branch of machine learning that attempts to emulate the way the human brain functions. It has led to major breakthroughs in areas such as language translation and image and voice recognition.