Adversarial Tokenization

Geh, Renato Lui, Shao, Zilei, Broeck, Guy Van den

Mar-3-2025–arXiv.org Artificial Intelligence

Current LLM pipelines account for only one possible tokenization for a given string, ignoring exponentially many alternative tokenizations during training and inference. For example, the standard Llama3 tokenization of penguin is [p,enguin], yet [peng,uin] is another perfectly valid alternative. In this paper, we show that despite LLMs being trained solely on one tokenization, they still retain semantic understanding of other tokenizations, raising questions about their implications in LLM safety. Put succinctly, we answer the following question: can we adversarially tokenize an obviously malicious string to evade safety and alignment restrictions? We show that not only is adversarial tokenization an effective yet previously neglected axis of attack, but it is also competitive against existing state-of-the-art adversarial approaches without changing the text of the harmful request. We empirically validate this exploit across three state-of-the-art LLMs and adversarial datasets, revealing a previously unknown vulnerability in subword models.

large language model, machine learning, tokenization, (19 more...)

arXiv.org Artificial Intelligence

Mar-3-2025

arXiv.org PDF

Add feedback

Country:
- Africa
  - Cameroon (0.14)
  - Chad (0.04)
  - Niger (0.04)
  - Nigeria (0.04)
- Asia
  - Georgia
    - Imereti > Kutaisi (0.04)
    - Tbilisi > Tbilisi (0.04)
  - Middle East
    - Jordan (0.04)
    - Lebanon > Beirut Governorate
      - Beirut (0.04)
    - Republic of Türkiye > Istanbul Province
      - Istanbul (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
- Europe
  - France (0.04)
  - Germany
    - Berlin (0.04)
    - Hesse > Darmstadt Region
      - Darmstadt (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
  - Monaco (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
  - United Kingdom (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.04)
  - Dominican Republic (0.04)
  - United States
    - California > Los Angeles County
      - Los Angeles (0.14)
    - District of Columbia > Washington (0.04)
    - Florida > Miami-Dade County
      - Miami (0.04)

Genre:
- Research Report > New Finding (0.92)

Industry:
- Government (0.68)
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)