AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Neural Information Processing SystemsFeb-9-2026, 11:28:28 GMT

65398a0eba88c9b4a1c38ae405b125ef-Supplemental-Datasets_and_Benchmarks.pdf

dataset, explanation, explanation method, (13 more...)

Country: Asia > Middle East > Jordan (0.04)

Genre:

Overview (0.67)
Research Report > New Finding (0.46)

Industry:

Banking & Finance (0.93)
Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(2 more...)

Neural Information Processing SystemsFeb-9-2026, 11:28:24 GMT

OpenXAI: Towards a Transparent Evaluation of Post hoc Model Explanations

While several types of post hoc explanation methods have been proposed in recent literature, there is very little work on systematically benchmarking these methods. Here, we introduce OpenXAI, a comprehensive and extensible open-source framework for evaluating and benchmarking post hoc explanation methods.

explanation, machine learning, natural language, (17 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.69)

Neural Information Processing SystemsFeb-8-2026, 14:45:06 GMT

4e246a381baf2ce038b3b0f82c7d6fb4-Supplemental.pdf

diag, explanation, ground truth explanation, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Neural Information Processing SystemsAug-15-2025, 09:48:41 GMT

A Evaluation Metrics

Users can include new explanation methods into OpenXAI's list of

dataset, explanation, explanation method, (13 more...)

Country: Asia > Middle East > Jordan (0.04)

Genre:

Overview (0.67)
Research Report > New Finding (0.46)

Industry:

Banking & Finance (0.93)
Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(2 more...)

Neural Information Processing SystemsAug-15-2025, 09:48:37 GMT

OpenXAI: Towards a Transparent Evaluation of Post hoc Model Explanations

explanation, machine learning, natural language, (17 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.69)

Nur, Gazi Nazia, Sadat, Mohammad Ahnaf

Impact of Adversarial Attacks on Deep Learning Model Explainability

arXiv.org Artificial IntelligenceDec-15-2024

In this paper, we investigate the impact of adversarial attacks on the explainability of deep learning models, which are commonly criticized for their black-box nature despite their capacity for autonomous feature extraction. This black-box nature can affect the perceived trustworthiness of these models. To address this, explainability techniques such as GradCAM, SmoothGrad, and LIME have been developed to clarify model decision-making processes. Our research focuses on the robustness of these explanations when models are subjected to adversarial attacks, specifically those involving subtle image perturbations that are imperceptible to humans but can significantly mislead models. For this, we utilize attack methods like the Fast Gradient Sign Method (FGSM) and the Basic Iterative Method (BIM) and observe their effects on model accuracy and explanations. The results reveal a substantial decline in model accuracy, with accuracies dropping from 89.94% to 58.73% and 45.50% under FGSM and BIM attacks, respectively. Despite these declines in accuracy, the explanation of the models measured by metrics such as Intersection over Union (IoU) and Root Mean Square Error (RMSE) shows negligible changes, suggesting that these metrics may not be sensitive enough to detect the presence of adversarial perturbations.

artificial intelligence, explanation, machine learning, (16 more...)

2412.11119

Country: North America > United States > Iowa (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-15-2023

Explaining black box text modules in natural language with language models

Singh, Chandan, Hsu, Aliyah R., Antonello, Richard, Jain, Shailee, Huth, Alexander G., Yu, Bin, Gao, Jianfeng

Large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their rapid proliferation and increasing opaqueness have created a growing need for interpretability. Here, we ask whether we can automatically obtain natural language explanations for black box text modules. A "text module" is any function that maps text to a scalar continuous value, such as a submodule within an LLM or a fitted model of a brain region. "Black box" indicates that we only have access to the module's inputs/outputs. We introduce Summarize and Score (SASC), a method that takes in a text module and returns a natural language explanation of the module's selectivity along with a score for how reliable the explanation is. We study SASC in 3 contexts. First, we evaluate SASC on synthetic modules and find that it often recovers ground truth explanations. Second, we use SASC to explain modules found within a pre-trained BERT model, enabling inspection of the model's internals. Finally, we show that SASC can generate explanations for the response of individual fMRI voxels to language stimuli, with potential applications to fine-grained brain mapping. All code for using SASC and reproducing results is made available on Github.

arxiv preprint arxiv, explanation, module, (14 more...)

2305.09863

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Brandt, Rafaël, Raatjens, Daan, Gaydadjiev, Georgi

Precise Benchmarking of Explainable AI Attribution Methods

arXiv.org Artificial IntelligenceAug-6-2023

The rationale behind a deep learning model's output is often difficult to understand by humans. EXplainable AI (XAI) aims at solving this by developing methods that improve interpretability and explainability of machine learning models. Reliable evaluation metrics are needed to assess and compare different XAI methods. We propose a novel evaluation approach for benchmarking state-of-the-art XAI attribution methods. Our proposal consists of a synthetic classification model accompanied by its derived ground truth explanations allowing high precision representation of input nodes contributions. We also propose new high-fidelity metrics to quantify the difference between explanations of the investigated XAI method and those derived from the synthetic model. Our metrics allow assessment of explanations in terms of precision and recall separately. Also, we propose metrics to independently evaluate negative or positive contributions of inputs. Our proposal provides deeper insights into XAI methods output. We investigate our proposal by constructing a synthetic convolutional image classification model and benchmarking several widely used XAI attribution methods using our evaluation approach. We compare our results with established prior XAI evaluation metrics. By deriving the ground truth directly from the constructed model in our method, we ensure the absence of bias, e.g., subjective either based on the training set. Our experimental results provide novel insights into the performance of Guided-Backprop and Smoothgrad XAI methods that are widely in use. Both have good precision and recall scores among positively contributing pixels (0.7, 0.76 and 0.7, 0.77, respectively), but poor precision scores among negatively contributing pixels (0.44, 0.61 and 0.47, 0.75, resp.). The recall scores in the latter case remain close. We show that our metrics are among the fastest in terms of execution time.

artificial intelligence, explanation, machine learning, (18 more...)

2308.03161

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands (0.04)
Europe > Italy > Marche > Ancona Province > Ancona (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)

Teufel, Jonas, Torresi, Luca, Friederich, Pascal

Quantifying the Intrinsic Usefulness of Attributional Explanations for Graph Neural Networks with Artificial Simulatability Studies

arXiv.org Artificial IntelligenceMay-25-2023

Despite the increasing relevance of explainable AI, assessing the quality of explanations remains a challenging issue. Due to the high costs associated with human-subject experiments, various proxy metrics are often used to approximately quantify explanation quality. Generally, one possible interpretation of the quality of an explanation is its inherent value for teaching a related concept to a student. In this work, we extend artificial simulatability studies to the domain of graph neural networks. Instead of costly human trials, we use explanation-supervisable graph neural networks to perform simulatability studies to quantify the inherent usefulness of attributional graph explanations. We perform an extensive ablation study to investigate the conditions under which the proposed analyses are most meaningful. We additionally validate our methods applicability on real-world graph classification and regression datasets. We find that relevant explanations can significantly boost the sample efficiency of graph neural networks and analyze the robustness towards noise and bias in the explanations. We believe that the notion of usefulness obtained from our proposed simulatability analysis provides a dimension of explanation quality that is largely orthogonal to the common practice of faithfulness and has great potential to expand the toolbox of explanation quality assessments, specifically for graph explanations.

artificial intelligence, explanation, machine learning, (16 more...)

2305.15961

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)