AITopics | benchmarking deep learning interpretability

Benchmarking Deep Learning Interpretability in Time Series Predictions

Neural Information Processing SystemsDec-24-2025, 00:13:34 GMT

Saliency methods are used extensively to highlight the importance of input features in model predictions. These methods are mostly used in vision and language tasks, and their applications to time series data is relatively unexplored. In this paper, we set out to extensively compare the performance of various saliency-based interpretability methods across diverse neural architectures, including Recurrent Neural Network, Temporal Convolutional Networks, and Transformers in a new benchmark of synthetic time series data. We propose and report multiple metrics to empirically evaluate the performance of saliency methods for detecting feature importance over time using both precision (i.e., whether identified features contain meaningful signals) and recall (i.e., the number of features with signal identified as important). Through several experiments, we show that (i) in general, network architectures and saliency methods fail to reliably and accurately identify feature importance over time in time series data, (ii) this failure is mainly due to the conflation of time and feature domains, and (iii) the quality of saliency maps can be improved substantially by using our proposed two-step temporal saliency rescaling (TSR) approach that first calculates the importance of each time step before calculating the importance of each feature at a time step.

benchmarking deep learning interpretability, name change, time sery prediction, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Review for NeurIPS paper: Benchmarking Deep Learning Interpretability in Time Series Predictions

Neural Information Processing SystemsJan-24-2025, 00:29:20 GMT

This method is not developed for time-series, only for tabular data, so it is unclear how they would do this evaluation, neither is it described anywhere in the main draft.

benchmarking deep learning interpretability, time sery prediction, time-series data, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Review for NeurIPS paper: Benchmarking Deep Learning Interpretability in Time Series Predictions

Neural Information Processing SystemsJan-24-2025, 00:29:14 GMT

This work introduces a bunch of benchmarks for evaluating time series saliency methods (with respective metrics). The authors do a number of empirical evaluations, draw some conclusions about why certain things don't work, and propose a new saliency method based on that. There are a number of things that I like about this work and that was pointed out by the reviewers as well: there is a definite lack of datasets with groundtruth saliency in them so coming up with such a dataset (and associated metrics) is a worthy contribution by itself (though perhaps not rising up to the bar of acceptance at NeurIPS). In general, everyone agreed that this part of the paper is good. What was more controversial: is the subsequent analysis interesting and novel enough?

benchmarking deep learning interpretability, dataset, time sery prediction, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Benchmarking Deep Learning Interpretability in Time Series Predictions

Neural Information Processing SystemsOct-10-2024, 02:50:44 GMT

Saliency methods are used extensively to highlight the importance of input features in model predictions. These methods are mostly used in vision and language tasks, and their applications to time series data is relatively unexplored. In this paper, we set out to extensively compare the performance of various saliency-based interpretability methods across diverse neural architectures, including Recurrent Neural Network, Temporal Convolutional Networks, and Transformers in a new benchmark of synthetic time series data. We propose and report multiple metrics to empirically evaluate the performance of saliency methods for detecting feature importance over time using both precision (i.e., whether identified features contain meaningful signals) and recall (i.e., the number of features with signal identified as important). Through several experiments, we show that (i) in general, network architectures and saliency methods fail to reliably and accurately identify feature importance over time in time series data, (ii) this failure is mainly due to the conflation of time and feature domains, and (iii) the quality of saliency maps can be improved substantially by using our proposed two-step temporal saliency rescaling (TSR) approach that first calculates the importance of each time step before calculating the importance of each feature at a time step.

benchmarking deep learning interpretability, time series data, time sery prediction, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Filters

Collaborating Authors

benchmarking deep learning interpretability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Benchmarking Deep Learning Interpretability in Time Series Predictions

Review for NeurIPS paper: Benchmarking Deep Learning Interpretability in Time Series Predictions

Review for NeurIPS paper: Benchmarking Deep Learning Interpretability in Time Series Predictions

Benchmarking Deep Learning Interpretability in Time Series Predictions