ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP
–Neural Information Processing Systems
In this work, we propose an innovative test-time poisoned sample detection framework that hinges on the in-terpretability of model predictions, grounded in the semantic meaning of inputs. We contend that triggers (e.g., infrequent words) are not supposed to fundamentally alter the underlying semantic meanings of poisoned samples as they want to
Neural Information Processing Systems
Feb-17-2026, 06:54:09 GMT
- Country:
- Asia > Nepal (0.04)
- North America > United States
- Indiana > Tippecanoe County
- Lafayette (0.05)
- West Lafayette (0.05)
- Indiana > Tippecanoe County
- Genre:
- Research Report (0.68)
- Industry:
- Information Technology > Security & Privacy (0.96)
- Technology: