On Trojans in Refined Language Models
Raghuram, Jayaram, Kesidis, George, Miller, David J.
–arXiv.org Artificial Intelligence
A Trojan in a language model can be inserted when the model is refined for a particular application such as determining the sentiment of product reviews. In this paper, we clarify and empirically explore variations of the data-poisoning threat model. We then empirically assess two simple defenses each for a different defense scenario. Finally, we provide a brief survey of related attacks and defenses.
arXiv.org Artificial Intelligence
Jun-11-2024
- Country:
- North America > United States (0.28)
- Genre:
- Overview (0.86)
- Research Report > New Finding (0.93)
- Technology: