Reviews: Fooling Neural Network Interpretations via Adversarial Model Manipulation

Jan-25-2025, 02:20:20 GMT–Neural Information Processing Systems

Originality: as far as I am aware, the idea of adversarial *model* manipulation is a new one, and their citation of related work, e.g. Quality: although I have confidence that the submission is technically sound, I think the experiments are insufficient and missing important categories of model/explanation method. I elaborate on this below. Clarity: the paper seems fairly clearly written and I'm confident that expert readers could reproduce its results. Significance: I think the strongest selling point of the work is the core idea -- adversarial model manipulation might have significant practical implications.

adversarial model manipulation, fooling neural network interpretation, submission, (4 more...)

Neural Information Processing Systems

Jan-25-2025, 02:20:20 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)