Special-Character Adversarial Attacks on Open-Source Language Model
–arXiv.org Artificial Intelligence
Large language models (LLMs) have achieved remarkable performance across diverse natural language processing tasks, yet their vulnerability to character-level adversarial manipulations presents significant security challenges for real-world deployments. This paper presents a study of different special character attacks including unicode, homoglyph, structural, and textual encoding attacks aimed at bypassing safety mechanisms. We evaluate seven prominent open-source models ranging from 3.8B to 32B parameters on 4,000+ attack attempts. These experiments reveal critical vulnerabilities across all model sizes, exposing failure modes that include successful jailbreaks, incoherent outputs, and unrelated hallucinations.
arXiv.org Artificial Intelligence
Nov-27-2025
- Country:
- North America > United States > Virginia (0.05)
- Genre:
- Research Report (0.82)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: