On Trojans in Refined Language Models

Raghuram, Jayaram, Kesidis, George, Miller, David J.

Jun-11-2024–arXiv.org Artificial Intelligence

A Trojan in a language model can be inserted when the model is refined for a particular application such as determining the sentiment of product reviews. In this paper, we clarify and empirically explore variations of the data-poisoning threat model. We then empirically assess two simple defenses each for a different defense scenario. Finally, we provide a brief survey of related attacks and defenses.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Jun-11-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Overview (0.86)
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (0.74)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found