ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP

Feb-17-2026, 06:54:09 GMT–Neural Information Processing Systems

In this work, we propose an innovative test-time poisoned sample detection framework that hinges on the in-terpretability of model predictions, grounded in the semantic meaning of inputs. We contend that triggers (e.g., infrequent words) are not supposed to fundamentally alter the underlying semantic meanings of poisoned samples as they want to

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Feb-17-2026, 06:54:09 GMT

Conferences PDF

Add feedback

Country:
- Asia > Nepal (0.04)
- North America > United States
  - Indiana > Tippecanoe County
    - West Lafayette (0.05)
    - Lafayette (0.05)

Genre:
- Research Report (0.68)

Industry:
- Information Technology > Security & Privacy (0.96)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Artificial Intelligence
    - Natural Language > Large Language Model (0.70)
    - Machine Learning > Neural Networks
      - Deep Learning (0.96)

Duplicate Docs Excel Report

Title
d2b752ed4726286a4b488ae16e091d64-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found